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ABSTRACT 


The  United  States  Army  Recruiting  Command  requires  tools  to  quantify  the 
impact  of  factors  in  the  recruiting  environment,  to  identify  differences  in  the  recruiting 
processes  across  its  five  regional  subordinate  units,  and  to  measure  the  effectiveness  of  its 
policies  and  resource  expenditures.  This  thesis  examines  recruiting  data  for  the  “high- 
quality”  male  demographic  from  July  1992  to  September  1997.  It  uses  multivariate  time 
series  analysis  to  predict  the  number  of  enlistment  contracts  signed  in  a  month  as  a 
function  of  fifteen  exogenous  and  endogenous  factors  plus  monthly  indicators.  A 
stepwise  recursion  using  bootstrap  simulation  is  developed  to  identifying  significant 
factors  in  the  multivariate  time  series.  The  significant  factors  in  the  reduced  models  are 
compared  to  those  contained  in  models  developed  in  previous  studies.  The  models  are 
also  used  to  create  nine-month  projections  of  recruiting  production,  which  are  compared 
to  known  production  figures  from  test  set  data  to  determine  forecast  accuracy.  The  results 
of  this  research  support  the  intuition  that  the  influential  factors  differ  by  region.  The 
stepwise  model  reduction  recursion  using  bootstrap  simulation  offers  potential  for  further 
refinement  and  application. 
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EXECUTIVE  SUMMARY 


The  United  States  Army  is  experiencing  the  greatest  recruiting  shortages  since  the 
inception  of  the  All-Volunteer  Force  in  1973.  The  service  faces  unprecedented 
competition  for  young  people  as  unemployment  is  at  its  lowest  level  in  thirty  years  and 
college  attendance  rates  are  the  highest  in  American  history.  The  U.S.  Army  Recruiting 
Command  (USAREC)  is  the  organization  charged  with  recruiting  civilians  for  service  in 
the  Army.  USAREC  requires  tools  to  quantify  the  impact  of  factors  in  the  recruiting 
environment,  identify  differences  in  the  recruiting  processes  across  its  five  regional 
subordinate  units,  and  measure  the  effectiveness  of  its  policies  and  resource  expenditures. 
This  thesis  examines  recruiting  data  for  from  July  1992  to  September  1997,  which  was  a 
very  dynamic  period  for  the  Army  and  Recruiting  Command.  The  scope  is  limited  to  the 
high-quality  male  demographic.  The  Army  defines  a  high-quality  recruit  as  one  who 
scored  above  the  50th  percentile  on  the  Armed  Forces  Qualification  Test  and  who  is  a 
high  school  graduate  or  general  equivalency  diploma  holder. 

A  considerable  amount  of  research  has  been  dedicated  to  the  topic  of  Army 
recruiting.  One  of  the  goals  of  this  thesis  is  to  validate  factors  from  previous  models  on 
more  current  data.  Many  observers  have  proposed  new  or  changing  influences  on  the 
recruiting  environment.  A  further  objective  of  this  thesis  is  to  explore  these  suppositions 
quantitatively  by  combining  new  factors  with  ones  previously  shown  to  be  significant. 
Enumeration  of  the  differences  in  the  recruiting  environment  throughout  the  country  is 
another  objective.  Finally,  this  thesis  aims  to  develop  an  accurate  tool  for  predicting 
recruiting  production  that  can  be  used  by  Army  leaders. 

Multivariate  time  series  analysis  is  used  to  predict  the  number  of  enlistment 
contracts  signed  in  a  month  as  a  function  of  exogenous  and  endogenous  factors  plus 
monthly  indicators.  Fifteen  factors  are  initially  included  for  examination  in  this  study  as 
predictive  variables.  They  are  selected  based  on  their  appearance  in  previous  models  or  in 
recent  research.  Autoregressive  moving  average  (ARMA)  models  are  developed  to 
produce  residuals  with  a  suitable  structure  for  bootstrapping.  The  bootstrap  is  used  to 
overcome  the  difficulties  in  determining  significant  factors  presented  by  the  short 
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duration  of  the  recruiting  data  time  series.  This  technique  allows  resampling  from  within 
the  existing  data  to  provide  robustness  in  the  factor  determination  process.  A  stepwise 
recursion  is  developed  to  eliminate  factors  from  the  time  series  models  that  are  not 
statistically  significant.  The  factors  remaining  in  the  reduced  models  are  compared  to 
those  found  to  be  significant  in  past  research.  The  developed  models  are  also  used  to 
create  nine-month  projections  of  recruiting  production.  The  results  are  then  compared  to 
known  production  figures  from  test  set  data  to  determine  forecast  accuracy  levels. 

The  final  models  indicate  that  unemployment  figures  and  high  school  graduate 
wage  levels  are  significant  factors  for  predicting  recruiting  production.  These  results  are 
consistent  with  findings  from  previous  studies.  However,  the  impact  of  these  two  factors 
is  not  clearly  interpretable  across  the  five  recruiting  brigades.  No  consistent  factors  for 
measuring  the  competition  between  the  Army  and  post-secondary  schooling  emerge  from 
the  model  development  process.  The  final  models  do  successfully  capture  the  seasonal 
nature  of  recruiting.  There  are  considerable  differences  in  the  final  model  for  each 
brigade,  indicating  that  influential  predictors  of  recruiting  production  differ  regionally. 
The  forecasts  produced  using  the  final  models  capture  the  general  behavior  of  the 
recruiting  production  series  in  the  test  period.  The  stepwise  recursion  using  bootstrap 
simulation  for  identifying  significant  factors  in  multivariate  time  series  analysis  proved  to 
be  a  useful  tool  and  offers  potential  for  further  refinement  and  application. 
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I.  INTRODUCTION 


A.  BACKGROUND 

The  United  States  Army  is  currently  experiencing  the  greatest  recruiting  shortages 
since  the  inception  of  the  All-Volunteer  Force.  The  service  faces  many  challenges  in  the 
recruiting  realm.  Competition  for  young  people  is  unprecedented  as  unemployment  has 
recently  reached  30-year  lows.  The  nation’s  youth  have  demonstrated  a  decreasing 
propensity  to  enlist  as  measured  by  the  annual  Youth  Attitude  Tracking  Survey  (YATS) 
and  have  enjoyed  the  highest  college  attendance  rates  in  U.S.  history  (Parlier,  1999).  As 
a  result  of  these  and  other  factors,  the  Army  has  failed  to  meet  its  recruiting  requirements 
every  year  since  1997. 

The  current  recruiting  conditions  are  in  stark  contrast  to  those  of  the  early  1990’s, 
a  period  that  represented  unequalled  success  in  terms  of  the  quantity  and  quality  of 
soldiers  recruited  by  the  Army.  This  achievement  coincided  with  a  decreased  recruiting 
demand  as  the  active  Army  force  was  pared  down  from  its  cold  war  level  of  close  to 
750,000  to  its  current  strength  of  approximately  480,000.  As  the  force  was  reduced, 
recruiting  requirements  decreased  33%  (Asch,  1999).  Towards  the  conclusion  of  the 
drawdown,  the  Army  entered  a  “steady  state,”  meaning  that  every  soldier  who  left  the 
service  had  to  be  replaced  by  a  new  recruit.  Hence,  accession  requirements  have  actually 
increased  slightly  since  1995. 

In  response  to  recent  shortcomings  in  meeting  recruiting  objectives.  The  Army 
Chief  of  Staff,  General  Shinseki,  declared  recruiting  “the  number  one  mission  on  his 
essential  task  list”  (Dickey,  1999).  The  organization  charged  with  the  mission  of 
recruiting  civilians  for  service  in  the  Army  is  the  U.S.  Army  Recruiting  Command 
(USAREC).  To  improve  performance,  USAREC  has  increased  recruiter  strength  15% 
since  the  beginning  of  1997.  It  is  also  offering  costly  new  enlistment  incentives. 

USAREC  is  dedicated  to  matching  people  to  Army  personnel  requirements.  Like 
many  high-tech  organizations,  the  Army  seeks  to  fill  its  ranks  with  “high-quality 
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recruits.”  The  Army  defines  a  high-quality  recruit  as  one  who  scored  above  the  50th 
percentile  on  the  Armed  Forces  Qualification  Test  (AFQT)  and  who  is  a  high  school 
graduate  or  general  equivalency  diploma  (GED)  holder.  Army  policies  over  the  past 
decade  have  required  that  90  to  95%  of  all  accessions  have  a  high  school  diploma  or 
GED.  Since  there  are  such  demanding  policy  requirements  for  high  quality  recruits,  this 
demographic  category  receives  a  majority  of  focus  and  recruiting  effort. 

B.  STATEMENT  OF  PROBLEM 

The  troubled  status  of  recruiting  has  gained  public  attention.  The  challenges  in 
this  arena  are  well  documented  and  USAREC  is  applying  more  resources  to  achieve  its 
objectives.  Simple  allocation  of  greater  funds  to  USAREC  alone  is  not  the  answer  to  the 
service’s  manpower  shortcomings.  Precise  application  of  these  monies  is  critical.  As 
pointed  out  by  RAND  researcher  Bruce  Orvis,  “decisions  about  increases  [must  be] 
preceded  by  identification  of  specific  shortages  that  need  to  be  remedied”  (Orivs,  1996). 
In  a  period  of  increasing  competition  for  eligible  recruits,  the  Army’s  challenges  will  not 
recede  in  the  foreseeable  future.  Therefore,  USAREC  must  operate  with  the  greatest 
possible  efficiency  in  its  application  of  limited  resources. 

Under  these  conditions  USAREC  requires  tools  to  measure  the  effectiveness  of  its 
policies  and  resource  expenditures  and  to  apply  an  appropriate  balance  of  effort  across  its 
five  major  subordinate  units,  which  represent  geographical  regions  of  the  United  States. 
This  thesis  uses  multivariate  time  series  analysis  to  predict  recruiting  production  (the 
number  of  enlistment  contracts  signed  in  a  given  period)  as  a  function  of  exogenous  and 
endogenous  factors. 

1.  Research  Questions 

Models  that  address  macro-level  policies,  recruiter  distribution,  and  allocation  of 
resources  have  utility  for  USAREC.  The  following  research  questions  motivated  the 
development  of  the  time  series  models  in  this  thesis: 


a.  What  are  the  most  significant  economic,  demographic,  and  policy  predictors  of 
recruiting  success? 

b.  What  are  the  differences  between  the  five  regional  recruiting  brigades  regarding  these 
various  factors? 

c.  How  effectively  can  recruiting  production  be  predicted  using  a  multivariate  time  series 
model? 

2.  Scope  and  Assumptions 

The  models  developed  in  this  thesis  predict  production  at  the  regional  level.  The 
scope  of  this  study  is  limited  to  the  high-quality  male  demographic,  which  single  largest 
category  of  recruits  accessed.  Though  USAREC  does  recruit  from  U.S.  territories  and 
protectorates,  as  well  as  from  within  American  military  communities  based  overseas,  this 
study  addresses  only  recruiting  efforts  and  production  in  the  fifty  states  plus  the  District 
of  Colombia. 

The  data  used  for  this  thesis  was  compiled  by  the  Defense  Manpower  Data  Center 
(DMDC)  for  the  Navy  College  Fund  Evaluation  Study.  The  period  examined  is  from  July 
1992  to  September  1997.  The  advantage  of  analyzing  this  period  was  that  it  is  a  time  of 
great  change,  which  offers  the  potential  to  provide  greater  contrast  in  certain  indicators 
that  can  be  exploited.  The  disadvantage  is  that  it  does  not  address  current  resources  levels 
and  economic  conditions. 

During  the  period  analyzed  by  this  study,  USAREC  underwent  two  major 
organizational  changes  with  respect  to  its  subordinate  units,  which  are  called  brigades. 
Initially,  it  maintained  a  five-brigade  structure.  During  1992,  it  changed  to  a  four-brigade 
structure,  and  then  returned  to  five  brigades  in  1995.  This  study  assumes  that  the  structure 
of  the  brigades  was  constant,  using  the  current  five-brigade  organization  and  its 
associated  geographical  boundaries. 
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C.  RESEARCH  OBJECTIVES 

A  considerable  amount  of  past  research  has  been  dedicated  to  the  topic  of  Army 
recruiting.  One  of  the  goals  of  this  thesis  is  to  validate  factors  from  previous  models  on 
more  current  data.  Many  observers  have  proposed  new  or  changing  influences  on  the 
recruiting  environment.  A  further  objective  of  this  thesis  is  to  explore  these  suppositions 
quantitatively  by  combining  new  factors  with  ones  previously  shown  to  be  significant. 
Enumeration  of  the  differences  in  the  recruiting  environment  throughout  the  country  is 
another  objective.  Finally,  this  thesis  aims  to  develop  an  accurate  tool  for  predicting 
recruiting  production  that  can  be  used  by  Army  leaders. 

D.  ORGANIZATION 

This  introduction  provides  the  objectives  and  organization  of  this  thesis.  A 
detailed  overview  of  Army  recruiting  and  a  review  of  previous  research  on  this  subject  is 
contained  in  Chapter  II.  Chapter  HI  describes  the  factors  in  the  time  series  models  and  the 
motivation  for  their  inclusion.  The  modeling  methodology  is  developed  in  Chapter  IV. 
Chapter  V  contains  the  results  and  a  discussion  of  their  implications.  Finally,  conclusions 
and  recommendations  are  provided  in  Chapter  VI. 
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II.  ARMY  RECRUITING 


A.  OVERVIEW 

1.  Mission  and  Structure 

The  Army  currently  requires  approximately  75,000  new  soldiers  a  year.  The 
Office  of  the  Deputy  Chief  of  Staff  for  Personnel  (ODCSPER)  determines  this 
requirement  and  passes  it  to  the  U.S  Army  Recruiting  Command,  the  organization 
responsible  for  recruiting  civilians  for  service  in  the  Army.  Recruiting  Command  is 
organized  into  five  subordinate  brigades,  which  have  general  regional  responsibilities  as 
follows:  northeast,  southeast,  north  central,  south  central,  and  west.  In  many  cases,  states 
are  divided  between  different  regions.  The  current  organization’s  geographic  boundaries 
are  reflected  in  figure  2.1. 


Figure  2.1  U.S.  Army  Recruiting  Command  Structure 


USAREC  has  approximately  1,600  recruiting  stations  where  the  business  of 
making  contacts  and  signing  contracts  actually  takes  place.  In  addition  to  its  6,000 


5 


recruiters,  USAREC  currently  employs  an  additional  4,250  uniformed  personnel  and 
1,100  civilians  in  research  and  support  activities.  Beyond  people,  one  of  its  major 
resources  is  advertising.  In  1997,  USAREC  spent  approximately  $87.9  million  in  national 
television,  radio,  and  print  advertising  campaigns.  This  figure  does  not  include  additional 
funding  that  is  provided  to  subordinate  commanders  for  local  advertising  efforts. 

2.  High-quality  Recruit  Definition 

The  Army  seeks  to  fill  its  ranks  with  “high-quality  recruits.”  Such  people  are 
trainable  on  technically  oriented  jobs.  They  also  have  higher  contract  completion  rates, 
and  greater  retention  for  additional  contracts.  The  Army  defines  a  high-quality  recruit  as 
one  who  is  in  Test  Score  Category  (TSC)  I-IIIA,  meaning  he  or  she  scored  above  the  50th 
percentile  on  the  Armed  Forces  Qualification  Test.  Additionally,  a  high-quality  recruit  is 
in  Educational  Credential  Tier  1,  which  means  that  he  or  she  is  a  high  school  or  GED 
diploma  holder.  The  Department  of  Defense  (DOD)  and  the  Army  state  policy  objectives 
for  the  number  of  TSC  I-IIIA  individuals  recruited.  U.S.  law,  DOD,  and  the  Army  have 
increasing  minimum  requirements  for  high  school  degree  holders  respectively.  The  Army 
policy  dictating  Tier  1  accessions  has  varied  between  90  and  95%  of  all  accessions  over 
the  past  decade. 

3.  Recruiting  in  the  1990s 

The  1990s  represented  a  period  of  great  change  for  the  United  States  Army.  The 
decade  began  with  the  Cold  War  victory  and  was  followed  in  1991  with  the  victory  in  the 
Gulf  War.  Since  then,  the  Army  has  experienced  increasing  operational  tempo  with 
numerous  peacekeeping  and  humanitarian  deployments  to  Somalia,  Haiti,  Bosnia,  and 
Kosovo,  among  others.  Correspondingly,  the  1990s  represented  a  turbulent  period  for  the 
Recruiting  Command. 

The  post-Cold  War  drawdown  reduced  the  Army’s  strength  approximately  35%. 
From  1989  to  1998  accession  requirements  decreased  by  about  the  same  degree  (Asch, 
1999).  The  combination  of  a  reduced  demand  for  new  soldiers,  the  military’s  increased 
public  popularity  following  the  triumph  in  the  Persian  Gulf,  and  the  1992  economic 
recession,  allowed  the  Army  to  recruit  the  best  educated  force  in  its  history.  In  1991,  98% 


of  new  soldiers  were  high  school  graduates  (Eitelberg,  1994).  Under  these  conditions, 
Recruiting  Command  was  able  to  cut  recruiter  strength  25%  and  reduce  advertising 
budgets  over  50%  between  1989  and  1994  (Orvis,  1996).  USAREC  also  reduced 
overhead  in  the  organization  of  its  subordinate  commands.  In  1992,  it  consolidated  its 
subordinate  units  from  a  five-brigade  to  a  four-brigade  structure. 

Towards  the  conclusion  of  the  drawdown,  the  Army  entered  a  “steady  state,” 
meaning  that  every  soldier  who  left  the  service  had  to  be  replaced  by  a  new  recruit.  As  a 
result,  accession  requirements  began  to  increase  slightly  in  1995.  In  response,  USAREC 
returned  to  a  five-brigade  structure,  and  has  increased  recruiter  strength  15%  since  1997. 
It  is  now  initiating  a  Corporal  Recruiter  Program  to  employ  younger  soldiers  to  better 
relate  to  its  target  audience  (Dickey,  1999).  USAREC  is  also  offering  shorter  enlistment 
terms,  and  in  1999  began,  for  the  first  time,  to  combine  enlistment  bonuses  with  the 
Army  College  Fund.  Despite  these  efforts,  the  Army  has  failed  to  meet  its  recruiting 
requirements  every  year  since  1997. 

Several  factors  in  the  recruiting  environment  have  contributed  to  the  Army’s 
recent  shortfalls.  The  lack  of  a  military  threat  to  the  nation  decreases  the  perceived  need 
to  serve.  By  1999,  the  military-to-civilian  pay  gap  had  grown  to  13.5%,  its  widest  level 
since  1979  (Parlier,  1999).  The  country  has  experienced  the  lowest  sustained 
unemployment  rate  in  thirty  years  (Bureau  of  Labor  Statistics,  1999).  Finally,  the 
increasing  number  of  youth  attending  post-secondary  education  and  the  increasing 
financial  return  on  a  college  degree  are  two  related  trends  that  are  thought  to  have  had  a 
significant  impact  on  the  market  for  high-quality  youth.  In  just  a  four-year  period  starting 
in  1990,  the  number  of  18-19  year-old  youths  attending  post-secondary  education 
increased  5%  to  60.2%,  and  the  number  of  20-21  year-olds  increased  13%  to  44.9% 
(Asch,  1999).  Despite  an  increase  in  supply  of  college  graduates,  the  wages  they  earn 
have  continued  to  increase  relative  to  those  of  high  school  graduates.  This  indicates  that 
the  demand  for  the  skills  that  these  graduates  bring  to  the  workplace  continues  to  be 
greater  than  the  supply. 
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B.  REVIEW  OF  PREVIOUS  RECRUITING  RESEARCH 


A  significant  number  of  studies  focus  on  predicting  recruiting  production  for  the 
All-Volunteer  Force.  This  research  provides  insight  into  influential  factors  and  the 
methods  used  to  identify  them.  The  following  two  studies  do  not  include  results  from 
multivariate  regression,  but  do  provide  useful  background  information  on  important 
variables  and  trends  in  Army  recruiting. 

1.  General  Background  Studies 

Orvis,  Sastry,  and  McDonald’s  1996  Military  Recruiting  Outlook  breaks  the 
recruiting  process  into  2  major  factors:  “supply  of  potential  enlistees”  and  “conversion  of 
potential  supply”  (Orvis,  1996).  The  researchers  employ  single-variable  regression  of 
specific  indicators  to  identify  trends  in  propensity  and  in  conversion  of  potential  supply. 
The  authors  determine  that  the  predicted  supply  for  FY  94  and  95,  as  measured  by 
propensity  of  high-quality  recruits  to  enlist  from  the  YATS  results,  was  actually  greater 
than  pre-drawdown  levels  of  supply.  This  suggests  that  any  shortfalls  in  recruiting  for 
these  two  years  resulted  from  the  Army’s  inability  to  convert  supply  to  enlistments.  The 
study  reveals  that  the  trend  in  propensity  to  enlist  was  decreasing,  especially  for 
minorities.  The  authors  predict  (accurately  in  retrospect)  that  by  FY  97  this  trend, 
combined  with  the  increasing  post-drawdown  accession  requirements,  would  result  in  the 
service  facing  a  supply  shortage  in  addition  to  its  conversion  difficulties.  The  study 
recommends  further  research  to  identify  causal  factors  for  the  conversion  shortcomings. 

Asch,  Kilbum,  and  Klerman’s  1999  RAND  study.  Attracting  College-Bound 
Youth  into  the  Military,  suggests  that  recent  recruiting  shortcomings  are  a  result  of 
permanent  changes  in  the  civilian  labor  market.  Specifically,  they  state  that  the  increase 
in  the  college  premium,  which  is  the  difference  between  the  average  real  wage  of  a 
college  degree  holder  and  that  of  a  high  school  diploma  holder,  is  driving  more  high- 
quality  youth  to  seek  post-secondary  education.  Hence,  their  research  indicates  that  all 
services  are  increasingly  competing  against  higher  education  and  not  the  immediate  labor 
market  for  TSC  I-IIIA  recruits.  The  researchers  use  existing  economic  models  of 
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recruiting  supply  and  conduct  statistical  analysis  of  various  factors  to  arrive  at  their 
policy  recommendations. 

2.  Multivariate  Time  Series  Studies 

The  following  five  studies  use  multivariate  regression  analysis  and/or  time  series 
regression  analysis  to  examine  factors  effecting  recruiting  production.  All  five  focus  on 
the  high-quality  enlistee  category. 

Robert  Cotterman  wrote  Forecasting  Enlistment  Supply:  A  Time  Series  of  Cross 
Sections  Model  for  a  1986  RAND  study.  The  author  develops  a  model  that  predicts 
monthly  enlistment  rates  for  each  service  in  each  state  based  on  three  empirical  factors 
and  68  indicator  variables.  Cotterman  uses  monthly  state-level  data  for  each  service  over 
a  78-month  period  starting  in  1974.  One  of  the  model’s  distinguishing  features  is  that  the 
covariance  structure  allowed  correlation  in  disturbances  across  periods,  across  services, 
and  across-  and  within-  state  components.  By  using  a  time  series  of  cross-sections  the 
author  avoids  collinearity  problems  associated  with  using  purely  time-series  data 
(Cotterman,  1986).  The  first  factor  in  the  model  represents  the  position  in  the  business 
cycle  by  a  measure  of  a  state  unemployment  rate’s  deviation  from  its  trend.  The  second 
factor  is  a  ratio  of  military  compensation  to  manufacturing  wages.  The  last  empirical 
factor  is  a  ratio  of  recruiting  force  strength  to  the  target  male  population  size.  Indicator 
variables  include  month,  state,  and  GI  Bill  availability.  The  model’s  forecasts  for  FY  81 
differ  from  the  actual  results  by  2%  to  13%.  It  is  most  accurate  for  the  Air  Force.  All 
predictors  demonstrate  expected  behavior  and  unemployment  is  the  most  significant 
factor.  The  author  concludes  that  the  covariance  structure  developed  in  this  model 
reduces  the  standard  error  of  the  estimates  from  those  in  earlier  models. 

Lewis  (1987)  constructs  a  time-series  of  cross  sections  regression  model  of  30 
environmental  factors  on  Army  recruiting  production  for  TSC  I-IIIA  males.  Lewis  groups 
the  factors  into  five  major  categories:  economic,  socio-demographic,  recruiting  resources, 
enlistment  policies,  and  enlistment  competition.  The  data  used  covers  the  period  from 
FY80  to  FY84  and  is  geographically  based  on  55  of  the  existing  56  recruiting  battalions. 
The  research  concludes  that  the  four  most  positive  environmental  factors  are  relative 
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military  pay,  unemployment,  recruiter  strength,  and  advertising.  The  most  negative 
factors  are  minority  representation  in  the  population,  college  degree  density,  and  the 
introduction  of  a  less  robust  college  fund  program. 

Dertouzos  and  Polich’s  1989  RAND  study,  Recruiting  Effects  of  Army 
Advertising,  is  one  of  the  first  research  projects  to  differentiate  between  various 
advertising  media.  Their  study  uses  monthly  data  for  a  three-year  period  from  1981  to 
1984  for  66  geographical  areas  defined  by  the  boundaries  of  the  Military  Entrance 
Processing  Stations  (MEPS).  The  model  controls  for  economic  and  demographic 
conditions  and  intensity  of  recruiter  effort.  The  dependent  variable  is  high-quality 
enlistments  predicted  by  the  number  of  low-quality  recruits,  local  supply  factors 
(unemployment  rate,  manufacturing  wages,  recruiter  strength,  bonus  programs), 
advertising  intensity  by  medium,  and  recruiter  activity.  The  most  significant  supply 
factors  are  recruiter  strength  and  unemployment  rate.  The  researchers  compare  the 
marginal  return  on  advertising,  recruiter  staffing,  and  cash  bonuses  and  conclude  that 
advertising  is  the  most  cost-effective  of  these  three  resources.  The  study  reveals  that 
national  magazines  and  local  newspapers  are  the  most  effective  media  followed  by 
national  radio  and  network  television.  Dertouzos  and  Polich  determine  that  the  most  cost- 
effective  media  are  national  magazine  and  newspaper  advertisements. 

John  Warner  and  Beth  Asch  summarize  the  results  of  a  number  of  empirical 
models  of  enlistment  supply  in  their  1995  paper,  The  Economics  of  Military  Manpower. 
They  state  that  there  have  been  two  generations  of  models  since  the  beginning  of  the  All- 
Volunteer  Force.  The  general  form  of  the  first  models  is  In  H  =  P  In  X,  in  which  H 
represents  the  number  of  high-quality  enlistees  and  X  represents  a  vector  of  supply 
variables.  The  advantage  of  the  logarithmic  form  is  that  the  variable  coefficients  could  be 
easily  interpreted  as  “supply  elasticities”  (Warner,  1995).  The  authors  declare  that  the 
second  generation  of  models  first  appeared  in  1986  and  began  to  account  for  the  behavior 
of  recruiters.  The  form  of  these  modes  is  In  H  =  X  In  L  +  P  In  X  +  In  E,  where  H  and  X 
represent  the  same  elements  as  the  earlier  models.  L  represented  the  number  of  low- 
quality  recruits  and  E  represented  a  measure  of  recruiter  effort  based  on  quotas.  The 


results  of  the  second-generation  models  consistently  indicate  that  unemployment  rates 
and  relative  civilian-to-military  pay  ratios  are  significant  factors  in  the  recruiting  process. 
The  authors  conclude  that  the  number  of  recruiters  is  the  most  significant  recruiting 
resource  factor. 

Dan  Goldhaber  published  a  critical  review  of  the  Navy’s  Enlisted  Goaling  Model 
in  a  1999  report  for  The  Center  For  Naval  Analyses.  The  Navy  Recruiting  Command 
uses  this  model  to  predict  high-quality  recruit  contracts  on  a  quarterly  basis.  The  Navy 
and  Army  definition  of  high-quality  enlistees  is  the  same.  The  model’s  independent 
variables  include  recruiter  strength,  seasonally  adjusted  unemployment  rates,  a  military- 
to-civilian  pay  ratio,  YATS  propensity  figures,  combined  Army  and  Navy  advertising 
expenditures,  veteran  population  figures,  and  additional  indicator  variables  for 
demographics,  seasonality,  and  policy  measures.  In  the  model,  all  non-binary  variables 
are  in  logarithmic  form.  The  model  uses  an  autoregressive  form  to  account  for  correlation 
between  recruiting  production  in  successive  quarters.  The  model’s  predictions  from  1994 
to  1999  are  within  10  percent  of  actual  production  results.  Goldhaber  uses  data  from  1992 
to  1998  to  analyze  the  structure  and  components  of  the  model.  He  concludes  that 
collinearity  of  the  predictive  variables  did  not  cause  bias  in  the  predictions.  He  finds  that 
the  existing  first-order  autoregressive  form  of  the  model  is  appropriate.  Finally, 
Goldhaber  suggests  that  feedback  from  recruiting  success  influences  advertising 
budgeting.  Hence,  he  recommends  removing  advertising  as  a  predictive  variable  to 
prevent  potential  biases  in  the  coefficient  estimates  and  the  model  predictions. 

3.  Summary 

These  studies  provide  insight  into  what  factors  have  been  influential  in  predicting 
recruiting  production  in  the  past.  Over  the  period  encompassed  by  these  works,  the 
mission  and  composition  of  the  Army  has  changed  dramatically.  The  quality  of  the  force 
as  measured  by  the  number  of  high  school  graduates  enlisting  has  drastically  improved, 
increasing  from  16%  in  1979  to  over  90%  throughout  the  1990s.  Despite  these  major 
changes,  in  all  but  one  multivariate  regression  analysis,  unemployment  is  the  most 
influential  predictor  of  recruiting  production  for  high  quality  enlistees.  In  that  one  study, 
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unemployment  ranks  second  to  recruiter  strength  as  the  most  significant  indicator.  More 
recent  works  suggest  that  competition  with  post-secondary  education  and  not  with  the 
unskilled  labor  market  is  a  factor  with  growing  importance  in  predicting  recruiting  for 
high-quality  youth. 

C.  EXISTING  USAREC  PRODUCTION  MODELS 

One  of  the  forecasting  tools  USAREC  uses  is  the  Command  Level  Mission  Model 
(CLEMM).  It  is  a  model  that  predicts  production  at  the  Battalion  level  as  a  function  of 
major  demographic  indicators  and  recruiter  intensity.  In  this  model,  recruiter  intensity  is 
measured  by  recruiter  strength  and  operational  policies  ( e.g .  the  number  of  recruiting 
workdays  in  a  month,  which  can  be  controlled  by  varying  the  number  of  mandated 
working  Saturdays).  Historically,  the  model  has  had  an  accuracy  rate  within  5%  for  the 
TSC  I-IIIA  category,  but  is  extremely  labor-intensive  to  support.  Though  the  model  is 
still  maintained  by  USAREC  and  used  by  the  Enlisted  Accessions  Branch  of  the 
ODCSPER,  USAREC  has  abandoned  CLEMM  in  favor  of  a  predictive  model  based  on 
recent  production  performance  (Pettit,  1999).  Beyond  CLEMM  there  are  no  large-scale 
models  currently  in  use  by  USAREC  that  predict  production  by  incorporating  policy, 
resource,  demographic,  and  economic  predictors  (Kaylor,  1999). 
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III.  DATA 


A.  FACTOR  SELECTION  AND  DESCRIPTION 

Sixteen  factors  are  initially  included  for  examination  in  this  study  as  predictive 
variables.  They  are  selected  based  on  their  appearance  in  previous  models  or  in  recent 
research.  The  intent  of  selecting  this  large  number  of  factors  is  to  determine  if  factors 
included  in  older  models  are  still  significant  indicators  for  predicting  recruiting 
production  and  to  determine  if  new  factors  postulated  to  be  influential  are  in  fact 
significant.  Unless  otherwise  specified,  all  data  was  provided  by  the  Defense  Manpower 
Data  Center  (DMDC).  This  data  was  originally  compiled  for  a  study  of  the  Navy  College 
Fund  being  conducted  by  Dr.  John  Warner  of  Clemson  University.  The  factors  are 
described  below. 

The  first  factor  is  mission.  It  reflects  the  numerical  goal  for  male  high  school 
graduates  and  high  school  senior  in  AFQT  categories  I-IIIA  (high-quality)  contracts  set 
by  Recruiting  Command  Headquarters  for  its  subordinate  Recruiting  Brigades  to  meet 
each  month.  It  is  selected  to  account  for  the  effort  that  the  production  recruiters  and  their 
commanders  expend  in  order  to  meet  their  assigned  recruiting  mission.  It  is  also  selected 
to  implicitly  account  for  incentives,  awards,  and  bonuses  offered  to  the  recruiters  on  the 
assumption  that  the  magnitude  of  rewards  are  adjusted  to  correspond  with  the  demands  of 
the  mission. 

The  second  factor  selected  is  recruiter  strength,  which  reflects  the  number  of 
production  recruiters  assigned  to  each  brigade.  Production  recruiters  are  the 
noncommissioned  officers  whose  job  is  to  make  contacts  with  potential  recruits  and  write 
enlistment  contracts.  Production  recruiters  represent  a  critical  subgroup  of  the  personnel 
assigned  to  Recruiting  Command  and  in  this  context  are  distinct  from  commanders, 
staffs,  and  government  civilians.  In  previous  studies,  recruiter  strength  has  been  identified 
as  one  of  the  most  cost-effective  factors  effecting  recruiting  production. 
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The  third,  fourth,  fifth,  and  sixth  factors  represent  the  impressions  made  by  Army 
advertising  in  television,  radio,  magazines,  and  newspapers  respectively.  These  factors 
measure  the  total  audience  exposures  for  each  national  advertising  campaign  in  a  month. 
They  do  not  reflect  the  impact  of  local  area  marketing  programs.  The  television 
impressions  are  measured  for  advertising  run  on  both  network  and  cable  programming. 
The  newspaper  impressions  combine  papers  with  national  distribution  along  with  college 
campus  newspapers.  The  Army’s  contracted  advertising  agencies  provide  impression 
figures  for  various  demographic  groups.  For  this  study,  the  audience  addressed  is  15-24 
year-old  males. 

The  percent  of  eligible  recruits  receiving  the  Army  College  Fund  (ACF)  when 
enlisting  is  the  seventh  factor.  The  ACF  provides  an  incentive  for  youth  to  join  the  Army 
with  the  promise  of  dedicated  money  for  post-secondary  education  upon  completion  of 
service.  The  ACF  is  only  available  to  potential  recruits  in  TSC  I-IEIA.  It  offers  funding  in 
addition  to  the  Montgomery  G.I.  Bill,  which  is  offered  to  all  Tier  1  enlistees.  Research 
conducted  by  Beth  Asch  in  her  1999  RAND  study  suggests  that  as  a  greater  proportion  of 
young  people  attend  college,  this  program  may  be  increasingly  effective  in  attracting 
college-bound  youth  to  enlist. 

The  eighth  factor  selected  is  the  target  population  size.  This  figure  represents  the 
total  males  in  AFQT  categories  I-IIIA  in  each  recruiting  region.  This  factor  is  included 
because  Eitelberg  and  Mehay  (1994)  predict  that  a  decreasing  youth  population  will 
compound  recruiting  challenges. 

The  unemployment  rate  is  the  ninth  factor  selected.  This  figure  is  the  ratio  of 
unemployed  to  the  civilian  labor  force  expressed  as  a  percentage.  The  data  is  directly 
extracted  from  the  Bureau  of  Labor  Statistics’  Local  Area  Unemployment  Statistics 
(BLS,  Selective  Data  Access).  The  unemployment  rate  chosen  is  not  seasonally  adjusted, 
because  the  models  developed  in  this  thesis  include  indicator  variables  for  month. 
Unemployment  appears  as  a  significant  factor  in  all  models  reviewed.  It  represents  the 
competition  for  youth  that  the  service  faces  from  the  civilian  labor  market. 
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The  next  factor  is  high  school  graduate  wages,  a  measure  of  the  average  weekly 
wage  earned  by  male  high  school  graduates  in  each  state.  This  data  is  extracted  from  the 
Monthly  Current  Population  Survey,  which  is  a  joint  project  of  the  Bureau  of  Labor 
Statistics  and  the  Census  Bureau.  Individuals  surveyed  are  males  age  18-35.  To  be 
included  in  the  survey,  an  individual  must  have  normal  weekly  hours  of  at  least  30 
greater  hours.  Like  unemployment,  youth  wages  are  included  in  all  of  the  previous 
models  examined. 

The  overall  17-21  year-old  college  attendance  rate  represents  the  eleventh  factor. 
This  rate  is  determined  by  dividing  the  college  population  for  each  state  by  the  total 
youth  population.  These  figures  are  extracted  from  data  compiled  by  Woods  and  Poole 
Economics,  Inc.,  an  independent  firm  that  produces  county-level  economic  and 
demographic  projections.  DMDC  provided  this  data  for  use  in  this  thesis.  The  attendance 
rate  in  this  model  is  not  specific  to  gender  or  to  the  Army’s  “high-quality”  criterion.  This 
factor  is  another  measure  of  the  competition  for  bright  young  people  between  the  military 
and  post-secondary  education. 

The  college  premium  represents  the  final  factor  addressing  the  Army’s 
competition  with  colleges  for  qualified  personnel.  In  this  thesis,  the  figure  represents  the 
difference  in  weekly  wages  between  male  high  school  graduates  and  college  graduates. 
Like  the  tenth  factor,  it  is  derived  from  the  Monthly  Current  Population  Survey. 

The  thirteenth,  fourteenth,  and  fifteenth  factors  represent  the  monthly  recruiting 
success,  measured  in  signed  contracts,  of  the  Air  Force,  Marine  Corps,  and  Navy  in  the 
high-quality  male  demographic.  This  category  is  included  to  determine  if  the  relationship 
between  the  recruiting  efforts  of  other  services  is  competitive  or  complementary. 

The  final  eleven  factors  are  binary  indicator  variables  for  each  of  the  months  from 
February  through  December.  January  represents  the  baseline  month  and  is  not 
specifically  represented  by  an  indicator  variable. 
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B.  DATA  AGGREGATION 


The  original  data  for  the  response  variable  and  the  fifteen  non-month  predictor 
variables  was  in  various  forms  regarding  their  geographic  and  time  divisions.  The 
following  table  reflects  the  original  form  of  each  variable: 


Variable 

Index 

Geographic 

Division 

Time 

Division 

Army  high-quality  male  contracts 

NA 

County 

Monthly 

Recruiting  mission 

1 

County 

Monthly 

Recruiter  strength 

2 

County 

Monthly 

TV  advertising  impressions 

3 

State 

Monthly 

Radio  advertising  impressions 

State 

Monthly 

Magazine  advertising  impressions 

5 

State 

Monthly 

Newspaper  advertising  impressions 

6 

State 

Monthly 

Percent  of  eligible  recruits  receiving  the 

college  option 

7 

State 

Monthly 

Target  population  size 

8 

County 

Yearly 

Unemployment 

9 

State 

Monthly 

High  school  graduate  wages 

10 

State 

Yearly 

College  attendance  rate 

11 

State 

Yearly 

College  wage  premium 

12 

State 

Yearly 

Air  Force  high-quality  male  contracts 

13 

County 

Monthly 

Marine  Corps  high-quality  male  contracts 

14 

County 

Monthly 

Navy  high-quality  male  contracts 

15 

County 

Monthly 

Table  3.1  The  Original  Form  of  the  Data  for  the  Selected  Variables. 


In  order  to  develop  a  separate  model  for  each  recruiting  brigade,  the  original  data 
is  aggregated  geographically  to  reflect  USAREC’s  current  regional  boundaries.  The 
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response  variable  and  six  of  the  independent  variables  are  enumerated  at  the  county  level 
in  the  data  provided  by  DMDC.  A  brigade-to-FIPS  county  code  file  provided  by 
USAREC  is  used  to  index  the  data  for  each  of  the  3,116  individual  counties  to  its 
appropriate  recruiting  brigade.  The  data  is  then  summed  for  each  brigade  for  each  month. 
For  each  of  these  seven  variables  there  are  215,004  records  (3,1 16  counties  multiplied  by 
the  number  of  months  in  the  series).  Due  to  the  large  size  of  these  files,  the  aggregation 
procedure  is  executed  using  SAS  programs  developed  by  Dennis  Mar  of  the  Naval 
Postgraduate  School  Systems  Management  Department.  For  each  variable,  the  number  of 
cases  in  which  records  have  a  FIPS  identifier  that  does  not  exist  in  the  indexing  code  is 
less  than  0.10%. 

The  data  for  the  variables  originally  organized  at  the  state  level  is  converted  to  the 
appropriate  regional  structure  using  weights  derived  from  the  target  population.  The 
weights  are  calculated  using 

weight  -  tarSetP°Pu^ati°nsr 
targetPopulations 

where  the  subscript  sr  indicates  the  state,  s,  in  region  r.  For  states  that  are  divided  among 
regions,  the  numerator  for  the  weight  calculation  is  the  portion  of  the  state’s  target 
population  in  each  region.  Once  the  weights  are  calculated,  they  are  multiplied  by  the 
state  figure  for  an  independent  variable.  These  values  are  then  summed  over  states  to 
determine  a  figure  for  the  brigade,  as  shown: 

indepVariabler  =  ^  (weighty  •  indepVariables). 


The  four  independent  variables  originally  represented  in  annual  time  divisions  are 
converted  to  monthly  figures  by  developing  a  linear  relationship  between  the  data  points. 
Monthly  figures  are  determined  by 


indepVariableym  =  indepVariabley  + 


indepVariable  y+l  -  indepVariabley 


12 


monthNumber 


V  j 

where  y  represents  year  and  m  month.  Figures  for  target  population  size  and  college 

attendance  rates  are  calculated  assuming  that  the  original  data  points  are  for  the  month  of 
January.  The  figures  for  high  school  graduate  wages  and  college  premium  are  calculated 
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assuming  that  the  original  data  points  are  for  the  month  of  June.  The  monthNumber  factor 
reflects  the  months,  ordered  1-12,  between  the  original  annual  data  points.  This  linear 
transformation  of  annual  observations  has  the  potential  to  produce  additional  noise  in  the 
process.  Original  data  with  monthly  observations  is  preferable,  but  is  unavailable. 
However,  target  population  figures  and  college  attendance  rates  are  not  expected  to 
change  significantly  month-to-month.  The  greatest  potential  for  error  induction  is  for  the 
college  premium  and  especially  the  high  school  graduate  wage  level,  both  of  which  could 
potentially  exhibit  seasonal  behavior. 

C.  EXPERIMENTAL  DESIGN  AND  DATA  SEGREGATION 

The  original  data  represents  a  63-month  period  from  July  1992  through  September  1997. 
The  first  54  months  are  selected  as  an  analysis  data  set  for  training  the  models.  For  all 
model  development,  T,  the  number  of  observations  a  series,  is  initially  equal  to  54.  The 
remaining  nine  months  are  reserved  as  the  validation  data  set  to  test  the  accuracy  of  the 
models’  forecasts.  The  series  extrema  and  averages  for  the  full,  test  and  training  data  sets 
are  listed  in  Appendix  A. 
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IV.  METHODOLOGY 


A  time  series  is  a  sequence  of  observations  made  over  time.  The  guiding  principle 
behind  time  series  analysis  and  forecasting  is  that  the  future  can  be  predicted  based  on 
determination  of  patterns  in  past  data  (Bowerman,  1979).  In  multivariate  time  series 
analysis  the  analysis  task  is  expanded  to  include  the  determination  of  the  interrelationship 
between  multiple  series  (Chatfield,  1996).  Based  on  the  relatively  small  size  of  the 
recruiting  data  set  and  the  varying  scales  of  the  regressor  values,  determining  which 
factors  were  significant  is  problematic.  The  means  employed  to  overcome  this  difficulty 
is  the  bootstrap  (Efron,  1998).  It  consists  of  resampling  from  within  the  existing  data  to 
provide  robustness  in  the  factor  determination  process.  Use  of  the  bootstrap  technique 
requires  that  the  residuals  of  a  hypothesized  model  be  independent.  This  necessitates  the 
development  of  an  autoregressive  moving  average  (ARMA)  model  for  each  brigade. 

These  requirements  lead  to  the  following  methodology.  First,  select  an 
appropriate  time  series  model  to  produce  residuals  with  structure  suitable  for 
bootstrapping.  Second,  develop  the  bootstrap  recursion.  Third,  develop  a  recursion  to 
conduct  stepwise  reduction  of  the  model  to  identify  significant  factors.  Fourth,  develop 
and  perform  diagnostics  on  a  final  reduced  model.  Finally,  use  this  model  to  predict 
future  recruiting  production  and  determine  the  accuracy  of  the  predictions.  The  process 
described  in  this  chapter  addresses  the  steps  to  develop  one  model  and  one  nine-month 
forecast.  It  is  applied  five  separate  times  to  develop  a  model  and  a  forecast  for  each 
USAREC  brigade. 

A.  FITTING  MUTIVARIATE  ARMA  MODELS 

For  most  measurements  taken  at  fixed  intervals  over  time,  there  is  an  underlying 
structure  to  the  data.  That  is,  there  exists  an  association  between  one  observation  and  its 
“neighbors.”  One  of  the  primary  tasks  of  the  analysis  is  to  determine  the  strength  of  that 
relationship.  A  multivariate  time  series  also  accounts  for  the  influence  of  regressor  series 
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on  the  response  variable.  Models  of  the  following  form  capture  these  relationships  for  the 
recruiting  data: 


y,  -  fiy  =  /?,( Mission ,  -//,)  +  ...  +  fi21  December  +  ^>i(yt_1-  juy)  +  ... 

+ 1 (y,-P  -My)  +  £,+  Bxe  + ...  +  , 


(3.1) 


where  y,  is  the  response  variable,  high  quality  male  contracts,  at  time  t.  The  p,  specify  the 
relationship  of  the  value  of  the  regressors  to  the  response  variable.  For  both  the  response 
and  the  non-binary  predictive  variables  the  data  must  be  centered  on  p,  the  mean  value  of 
the  respective  series.  The  autoregressive  parameters,  (pp ,  capture  the  strength  of  the 

relationship  between  the  value  of  the  response  variable  at  time  t  and  observations  in 
previous  periods.  Uncertainty  in  the  time  series  process  is  captured  by  sf,  a  “purely 
random  disturbance  term  with  a  mean  of  zero  and  a  variance  of  a2  (Harvey,  1994).  The 
moving  average  parameters,  09,  represent  the  relationship  between  the  response  and  these 
disturbances  in  previous  periods.  The  number  of  autoregressive  parameters,  p,  and  the 
number  of  moving  average  parameters,  q,  define  the  order  of  an  ARMA  ip,  q)  model.  In 
essence,  the  multivariate  ARMA  model  for  each  brigade  represents  the  deviation  of  the 
response  variable  at  time  t  from  the  mean  of  its  series  by  the  deviation  of  the  non-binary 
predictive  variables  from  their  respective  series  means,  the  magnitude  of  the  monthly 
effect,  plus  the  autoregressive  and  moving  average  effects. 

Equation  3.1  models  the  complete  series  of  recruiting  data.  The  values  of  the 
model  parameters  are  estimated  through  analysis  from  a  sample  of  this  theoretically 
infinite  series.  A  number  of  notational  changes  are  made  to  distinguish  these  estimations 
from  the  parameters  of  the  complete  series.  The  observed  values  of  the  response  variable 
and  the  predictive  variables,  juy  and  pi ,  are  represented  by  yi  and  x, ,  where  the  index 

i  =  1,2,...,  15  runs  over  the  indices  listed  in  Table  3.1.  The  actual  regression  coefficients, 
actual  autoregressive  and  moving  average  parameters,  and  disturbances  are  estimated  by 
0.,  ® q > £,  respectively.  To  simplify  the  representation  of  the  terms  in  equation 

(3.1)  and  to  address  the  distinctions  of  sample  data,  the  following  notation  is  introduced. 
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The  deviation  of  the  response  variable,  high-quality  male  recruiting  contracts,  from  the 
mean  of  its  series  is  represented  by 

z,  =  y<-  y> 

while  the  value  of  the  response  variable  estimated  by  a  model  is  represented  by  z ,  ■ 

The  centered  regressors  are  represented  by 

sit  =  (xit  -X,), 


where  the  index  i  =  1,2,. ..,15  on  each  respective  xit  corresponds  to  the  indices  listed  in 
Table  3.1.  The  models  developed  based  on  the  sample  data  therefore  have  the  form 

z,  =  P  su  + ...  +  p21 December  +  <j)  iz,_1  + ...  +  (j)pz,_p  +  £,+  0&_x  + ...  +  0g£,_q.  (3.2) 


Models  of  the  form  reflected  in  equation  (3.2)  are  created  using  the  Gaussian 
maximum  likelihood  estimation  method  in  the  S-Plus  statistical  software  package 
(Mathsoft  Inc.,  1999).  Two  criteria  are  used  for  selecting  the  appropriate  ARMA  model 
for  each  brigade.  First,  the  model’s  residuals  must  not  display  any  significant 
autocorrelation  or  partial  autocorrelation.  Second,  the  model  has  to  be  of  the  lowest 
possible  order  while  fitting  the  data  well. 

The  autocorrelation,  p*,  represents  the  strength  of  the  relationship  between  any 
two  observations  in  a  time  series  separated  by  a  lag  of  k  time  periods.  The  autocorrelation 
is  a  dimensionless  measure  with  values  between  1  and  — 1.  When  p k  is  large  in  magnitude, 
observations  k  time  units  apart  tend  to  move  together  in  a  linear  fashion,  and  hence  are 
not  independent.  The  sign  of  p*  indicates  the  direction  of  this  movement.  The  partial 
autocorrelation,  p m,  represents  the  autocorrelation  between  any  two  observations 
separated  by  a  lag  of  k  ignoring  the  effects  of  the  intervening  observations.  The 
autocorrelation  function  (ACF)  and  partial  autocorrelation  function  are  lists  of  p*  and  p kk 
at  lags  k=  1 ,2,3 . . .  (Bowerman,  1979).  The  graphs  of  these  functions  are  called 
correlograms,  which  are  the  actual  diagnostic  tools  used  to  assess  the  first  criterion.  An 
example  correlogram  with  lags  up  to  k=  24  is  displayed  in  Figure  3.1. 
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H-Q  Male  Recruiting  Production 


Figure  3.1  Example  Correlogram  Produced  Using  S-Plus 

The  standard  for  determining  significant  ACF  and  partial  ACF  involves  Bartlett’s 
approximation  of  the  standard  error  of  autocorrelation  estimates,  var[r*]  (see  Box  and 
Jenkins,  1976,  pp.  34-35).  Values  of  the  ACF  at  any  lag  k  >  0  which  exceed 

+  /-  2^/var [r*  ]  are  considered  significant.  This  range  is  automatically  calculated  by  S- 

Plus  and  indicated  by  the  dotted  horizontal  lines  in  the  generated  correlograms. 

Box  and  Jenkins  define  the  notion  of  developing  parsimonious  ARMA  (p,  q) 
models,  which  means  choosing  the  smallest  values  of  p  and  q  that  adequately  capture  the 
nature  of  the  time  series.  The  Akaike  information  criterion  (AIC)  captures  the  essence  of 
this  concept.  It  provides  a  tool  for  comparing  models,  by  indicating  a  model’s  goodness 
of  fit  while  penalizing  complexity.  Models  are  selected  by  minimizing  the  AIC,  which  is 
defined  by 

AIC  =  -2  *  lo  gL(i//)  +  2  n. 

In  this  equation,  L(y/)  is  the  maximized  value  of  the  likelihood  and  n  is  the  sum  of  the 
ARMA  model  orders  (p  +  q).  Complex  models  have  higher  values  of  L(y/),  hence  a  large 
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negative  first  term,  but  are  penalized  for  having  a  large  value  of  n.  Simple  models  have 
smaller  values  of  L(y/)  creating  a  smaller  negative  first  term,  but  small  values  of  n.  The 
AIC  is  a  means  of  balancing  these  competing  characteristics.  (Harvey,  1994) 

B.  BOOTSTRAP 

All  of  the  ARMA  models  developed  have  random  disturbance  terms 
(£j,£2,...,£r).  In  models  that  include  an  AR  term,  observations  from  the  first  p  time 
periods  are  used  to  initiate  the  autoregressive  process  and  no  estimates  from  these  periods 
are  derived.  Hence,  the  first  disturbance  term  from  an  AR  model  is  i  , .  Since  the  i, 

represent  random  disturbance  terms,  by  definition  they  are  assumed  to  be  independent. 
The  ARMA  model  residuals  are  defined  by 

e,  =z, -z, 

in  which  et  is  the  deviation  of  the  selected  model’s  response  variable  value  from  the 
original  observation  at  time  t.  Like  the  random  disturbance  terms,  the  residuals  are 
assumed  to  be  independent  based  on  the  first  modeling  criterion.  Because  of  their  similar 
properties,  the  residuals  are  used  to  approximate  the  random  disturbance  terms.  This 
concept  is  the  basis  for  the  development  of  the  bootstrap  method  in  this  application. 

The  bootstrap  recursion  first  samples  with  replacement  from  the  set  of  residuals 
(ep+i,  ep+2,...,  ei)  to  develop  a  new  set  of  residuals,  e,*.  Next,  a  new  series  of  the  response 
variable,  z*,  is  simulated  using  the  new  set  of  residuals  to  represent  the  random 
variations,  so  that  z*=  z,  +  et*.  The  final  step  is  to  refit  an  ARMA  model  of  the  same 

order  as  the  original,  using  the  original  regressor's  paired  with  the  simulated  z*  series. 
The  simulated  response  variable  values  force  slight  changes  in  the  estimates  of  the 

A  A  A 

p, ,  (f)p ,  and  6q .  At  the  conclusion  of  the  recursion,  the  regression  coefficients  are  saved 

to  a  vector.  The  applicable  autoregressive  and/or  moving  average  parameters  are  saved  to 
their  own  vectors. 


23 


Concatenation  of  the  output  vectors  for  multiple  repetitions  of  the  bootstrap 
recursion  creates  a  matrix  of  regression  coefficients.  Each  column  of  this  matrix 
represents  the  developed  regression  coefficients  for  one  factor.  Analysis  of  the  central 
tendency  and  variance  of  the  figures  in  each  column  provides  robustness  in  determining 
the  value  of  the  and  hence,  the  influence  of  each  factor  on  the  response  variable. 

The  method  of  generating  the  approximations  to  the  random  disturbance  terms 
described  above  represents  a  non-parametric  approach.  The  original  distribution  of  the 
residuals  used  to  represent  the  random  disturbance  terms  is  preserved  in  this  method.  An 
alternate  approach  is  also  developed  in  which  the  random  variations  are  generated  by 
sampling  from  a  normal  distribution  with  a  mean  of  zero  and  a  standard  deviation  equal 
to  the  standard  deviation  of  the  set  of  residuals,  (ep+i,  e  p+2,  ...,  ej).  The  S-Plus  code  for 
executing  this  bootstrap  recursion  allows  specification  of  whether  to  use  the  parametric  or 
non-parametric  approach  for  sampling  from  the  residuals.  The  default  method  is  non- 
parametric. 

The  underlying  theory  for  the  development  of  this  recursion  is  due  to  Efron  and 
Tibshirani  (1998,  chapter  8).  See  Appendix  B  for  the  bootstrap  recursion  S-Plus  code. 

C.  STEPWISE  REDUCTION 

The  bootstrap  recursion  provides  a  tool  for  overcoming  some  drawbacks  of 
having  a  limited  number  of  time  series  observations  from  which  to  determine  the 
relationship  between  the  predictive  and  the  response  variables.  The  next  stage  of  model 
development  addresses  the  research  objective  of  identifying  the  significant  factors  in 
predicting  recruiting  production. 

The  underlying  premise  for  the  elimination  of  factors  is  as  follows.  If  the  mean 

A 

value  of  a  regression  coefficient,  /3i ,  calculated  from  multiple  iterations  of  the  bootstrap 

is  within  a  defined  interval  around  zero,  it  can  be  interpreted  as  not  significantly  different 
from  zero  (Efron,  1998).  If  this  is  the  case,  the  associated  factor  is  not  considered 
influential  in  predicting  the  behavior  of  the  dependent  variable.  Such  a  factor  can  be 


24 


eliminated  from  the  model,  which  promotes  model  simplicity  and  improves  model 
accuracy  by  removing  noise  associated  with  the  discarded  factor. 

Experimentation  with  the  application  of  this  idea  for  identifying  non-significant 
factors  provides  valuable  insight.  Initially,  factors  that  had  means  in  a  defined  range 
around  zero  were  removed  all  at  once.  The  reduced  ARMA  models  produced  by  this 
approach  gave  unpredictable  results.  Specifically,  the  regression  coefficients  for  the 
remaining  factors  demonstrated  sign  changes  from  the  original  to  the  reduced  model. 
This  observation  led  to  the  development  of  a  stepwise  recursion  to  promote  stability  as 
the  model  is  reduced. 

The  intent  of  the  recursion  is  to  eliminate  factors  one-by-one  until  only  significant 
factors  remain  in  the  model.  The  recursion  first  executes  the  bootstrap  to  develop  a 

matrix  of  /?, .  Factors  are  identified  as  candidates  for  elimination  if  the  mean  of  a  column 

of  regression  coefficients  lie  within  the  range  of  0±  a  -(standard  deviation  of  the  column 
of  regression  coefficients).  The  a  term  is  an  input  parameter  that  controls  the  width  of 

the  interval.  For  each  candidate  factor,  the  proportion  of  the  estimated  /?(  in  the  range 

0  ±  a  -(standard  deviation  of  the  column  of  regression  coefficients)  is  calculated.  The 
candidate  factor  that  has  the  highest  proportion  is  eliminated  and  a  new  ARMA  model  of 
the  original  order  is  refit.  This  recursion  is  repeated  until  no  factors  are  eliminated.  The 
final  significant  factors  are  then  displayed. 

The  S-Plus  code  for  executing  the  stepwise  regression  allows  control  of  the 
number  of  iterations  of  the  bootstrap  recursion,  the  method  of  residual  sampling 
(parametric  or  non-parametric),  and  a,  the  tolerance  defining  the  size  of  the  interval 
around  zero.  The  accepted  standard  for  repetition  of  the  bootstrap  is  1,000  iterations 
(Efron,  1998).  Non-parametric  residual  sampling  is  employed  and  a  value  of  1  is  used  for 
a.  The  code  for  the  stepwise  reduction  recursion  is  in  Appendix  B. 
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D.  FINAL  MODEL  DEVELOPMENT  AND  DIAGNOSTICS 


After  the  significant  factors  for  a  brigade  are  determined,  the  final  reduced 
ARMA  model  is  produced.  It  is  of  the  same  order  as  the  original  full  model. 
Correlograms  are  plotted  to  ensure  there  is  no  significant  ACF  or  partial  ACF  of  the 
residuals.  The  AIC  of  the  reduced  model  is  calculated  to  ensure  it  is  less  than  the  AIC  of 
the  original  full  model.  The  regression  coefficients  of  the  factors  remaining  in  the  final 
model  are  then  examined  to  ensure  that  they  have  the  same  sign  as  they  did  in  the  initial 
full  model.  The  presence  of  a  sign  change  is  not  necessarily  an  indication  that  the  model 
is  invalid,  but  it  prompts  scrutiny  of  the  data  and  the  factor  reduction  process. 

E.  FORECASTING 

Once  the  final  model  for  each  region  is  determined,  it  is  used  to  forecast  the 
number  of  high-quality  male  contracts  in  the  test  period.  Like  the  training  data,  the  test 
data  is  first  centered  on  the  mean  of  respective  variable  from  the  training  series. 

A  simulation  is  used  to  develop  a  predicted  time  series  of  the  response  variable. 
The  length  of  the  predicted  time  series  is  nine  months,  corresponding  to  the  length  of  the 
test  data  period.  The  regressor  values  are  from  the  test  data.  The  simulation  uses  a 
parametric  approach  to  the  generation  of  the  random  errors.  The  mean  of  the  random 
errors  is  zero  and  the  standard  deviation  is  equal  to  the  standard  deviation  of  the  residuals 
from  the  final  model.  The  simulation  is  repeated  1,000  times  to  develop  multiple 
predictions  for  the  nine-period  series.  The  mean  and  standard  deviation  of  the  1,000 
predictions  for  each  month  of  these  simulated  series  are  used  to  make  the  forecast.  The 
code  for  producing  the  forecasts  is  contained  in  Appendix  B. 

The  forecast  error  for  each  observation  in  the  9-period  time  series  is  defined  by 

A(  —  Zt  £/( forecast )• 
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Once  calculated,  diagnostics  of  the  forecast  errors  are  performed.  The  At  values  are 
plotted  to  determine  if  they  appear  randomly  distributed.  Correlograms  of  the  forecast 
errors  are  plotted  to  determine  if  they  exhibit  significant  ACF  or  partial  ACF. 

In  order  to  provide  results  that  can  be  conveniently  and  directly  compared  to  the 
original  data,  the  centering  procedure  required  for  ARMA  model  development  is  reversed 
as  follows 

A  _  /N 

y t (forecast)  ^t (forecast) 

The  percent  error  of  the  forecasts,  defined  as 

percentError  =  (yt  -  yt(forecast))l  y,  -100, 

is  also  calculated  for  each  of  the  nine  forecast  values.  Finally,  the  actual  behavior  of 
response  variable,  yt ,  the  forecast  value,  y,iforecast) ,  and  the  forecast  plus  and  minus  one 

standard  deviation  of  the  forecast  are  plotted  to  provide  a  visual  tool  for  interpreting  the 
forecast’s  accuracy. 
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V.  RESULTS 


Models  are  developed  that  include  all  sixteen  factors  plus  monthly  indicators  as 
described  in  Chapter  HI.  Based  on  the  recommendations  made  by  Goldhaber  in  his  1999 
review  of  the  Navy’s  Enlisted  Goaling  Model,  models  that  exclude  all  advertising  data 
are  also  developed.  The  models  that  do  not  include  advertising  impression  data  are  more 
accurate  for  forecasting  as  measured  by  the  percent  error  of  the  forecasts.  Therefore,  all 
model  results  described  in  this  chapter  refer  to  models  that  initially  excluded  advertising 
impression  data.  The  behavior  of  the  advertising  time  series  is  still  examined  in  the 
Descriptive  Statistics  section. 

A.  DESCRIPTIVE  STATISTICS 

1.  Time  Series  Graphs 

A  basic  step  in  the  analysis  of  time  series  is  plotting  the  data  to  identify  trends, 
outliers,  seasonality,  and  other  cyclic  changes  (Chatfield,  1996).  The  series  for  the 
variables  in  each  brigade  are  reflected  in  Figures  5.1  through  5.5.  In  all  these  graphs,  the 
training  and  test  series  are  plotted  as  one  series.  A  vertical  line  between  December  1996 
and  January  1997  indicates  the  division  between  these  two  sets.  Note  that  the  linear 
construct  of  the  four  independent  variables  converted  from  annual  to  monthly  data, 
(target  population  size,  college  attendance  rates,  high  school  graduate  wages,  and  college 
premium),  precludes  observation  of  seasonal  behavior. 

The  first  seven  time  series  graphs  address  eight  variables  specific  to  Army 
recruiting  and  USAREC  policies.  The  Army’s  high-quality  male  recruiting  shortages  are 
clearly  reflected  by  the  increasing  difference  between  the  recruiting  mission  and 
production  in  each  brigade.  Recruiting  production  demonstrates  clear  seasonality  with 
peaks  each  June.  Recruiter  strength  does  not  appear  seasonal,  but  assignments  increase 
noticeably  in  all  brigades  beginning  in  early  1997.  All  advertising  media  display  a 
decreased  number  of  impressions  around  the  months  of  June  and  July.  Television  and 
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radio  demonstrate  increasing  trends,  while  magazine  and  newspaper  impressions  lack 
clear  trends.  The  percent  of  youth  receiving  the  Army  College  Fund  exhibits  no 
discemable  trend  or  seasonality. 

Factors  of  the  recruiting  environment  are  captured  in  the  next  five  time  series 
graphs.  In  all  brigades,  target  population  figures  initially  demonstrate  a  decrease  between 
0.2%  and  2.9%.  However,  in  all  but  the  First  Brigade  region,  there  is  a  net  growth  in  the 
target  population  between  1992  and  1997.  Relative  growth  is  greatest  in  Second  Brigade, 
followed  closely  by  Sixth  Brigade  (10.1%  and  9.9%  respectively).  In  all  regions, 
unemployment  displays  clear  seasonality  with  peaks  in  January  and  June  and  an  overall 
decreasing  trend.  High  school  graduate  wages  demonstrates  a  steady  increase  over  time. 
The  behavior  of  college  attendance  rates  differs  between  brigades.  All  regions  show  a  dip 
in  attendance  in  1994.  In  the  Fifth  and  Sixth  Brigade  regions,  there  is  a  net  decrease  in 
the  college  attendance  rate  over  the  period  examined,  while  the  other  three  regions 
experience  a  net  increase.  None  of  the  attendance  rate  changes  are  more  than  two 
percentage  points.  The  college  premium  exhibits  a  net  increase  in  all  brigades,  though  the 
behavior  varies  by  region.  The  relative  increase  is  largest  in  Third  Brigade  (43%)  and 
smallest  in  Second  Brigade  (10%). 

The  final  time  series  graphs  address  the  behavior  of  rival  services’  recruiting 
production  in  the  high-quality  male  demographic.  The  Air  Force,  Marine  Corps,  and 
Navy  all  display  a  decreasing  trend  until  about  1994,  after  which  the  mean  of  each  series 
appears  fairly  constant.  All  the  rival  services  demonstrate  seasonal  summer  peaks,  though 
their  occurrence  seems  to  vary  by  one  to  two  months,  with  the  Air  Force’s  peak  occurring 
later  in  the  summer. 
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Figure  5.1  First  Brigade  Factor  Time  Series 
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Figure  5.2  Second  Brigade  Factor  Time  Series 
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Figure  5.2  Continued  Second  Brigade  Factor  Time  Series 
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Figure  5.3  Third  Brigade  Factor  Time  Series 
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Figure  5.3  Continued  Third  Brigade  Factor  Time  Series 
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Figure  5.4  Fifth  Brigade  Factor  Time  Series 
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Figure  5.5  Sixth  Brigade  Factor  Time  Series 
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Unemployment  High  School  Graduate  Wages 


2.  Time  Series  Structure  of  the  Variables 

The  correlograms  for  the  variable  series  in  all  brigades  are  contained  in  Appendix 
C.  They  prove  valuable  for  confirming  the  observations  of  seasonality  discerned  from  the 
time  series  graphs  and  also  for  identifying  cycles  that  are  not  visually  evident.  Only 
departures  from  graphical  observations,  additional  insights,  and  unexpected  results  are 
addressed  in  this  section. 

The  ACF  of  the  mission  variable  series  decays  to  an  insignificant  level  in  a  lag  of 
k=5  or  less  in  all  brigades.  This  behavior  is  somewhat  unexpected.  It  suggests  that 
USAREC  did  not  issue  contract  missions  to  its  subordinate  brigades  in  accordance  with 
the  known  seasonal  behavior  of  recruiting  production. 

TV  advertising  impressions  demonstrate  seasonality  with  significant  annual  ACF 
and  partial  ACF  figures  in  all  brigades.  Magazine  ad  impressions  reflect  clear  biannual 
peaks  in  ACF  at  6-  and  12-month  lags.  Radio  advertising  impressions  are  not  consistent 
across  the  country.  In  Second  and  Third  Brigades,  the  autocorrelation  functions  are 
significant  at  k=6.  First  Brigade  has  a  significant  positive  ACF  at  a  lag  of  12  months. 
Fifth  and  Sixth  Brigades  have  a  significant  positive  ACF  at  both  k= 6  and  k=  12. 
Newspaper  advertising  impressions  display  no  evidence  of  periodicity  from  the  ACF. 

In  all  regions  except  Sixth  Brigade,  Air  Force  high-quality  male  contracts 
demonstrate  significant  peaks  in  ACF  at  lags  of  6  and  12  months.  In  the  Sixth  Brigade 
region.  Air  Force  recruiting  shows  a  significant  peak  in  ACF  only  at  6  months,  which  is 
unexpected.  The  Marine  Corps’s  production  in  this  demographic  reflects  the  strongest 
seasonal  behavior  of  any  service,  with  very  significant  ACF  at  lags  of  12  and  24  months 
and,  in  all  but  the  Third  Brigade  region,  clearly  significant  ACF  at  a  lag  of  36  months. 
The  ACFs’  behavior  also  confirms  the  fact  that  the  Marine  Corps’s  high-quality  male 
production  appears  the  most  consistent  of  all  the  services  with  little  trend  and  very  clear 
seasonality.  In  all  but  the  Sixth  Brigade  region,  the  ACF  for  the  Navy’s  high-quality  male 
recruiting  series  demonstrates  a  much  slower  decay  than  the  other  services,  indicating  a 
less  seasonal  behavior.  However,  the  Navy  production  correlograms  does  still  have  peaks 
in  the  ACF  at  a  lag  of  k=12. 
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3.  Correlation  Between  Time  Series 

The  time  series  in  each  brigade  are  examined  for  high  values  of  simple  correlation 
between  variables.  Variables  with  high  correlation  present  the  potential  for  multi- 
collinearity,  which  can  cause  unstable  models.  The  threshold  for  high  correlation  in  this 
study  is  defined  as  p  >  .95.  Identification  of  correlation  values  above  this  level  do  not 
represent  a  criterion  for  excluding  variables  from  the  initial  model.  However,  the  final 
reduced  models  are  examined  to  ensure  that  both  variables  from  pairs  with  high 
correlation  values  are  not  present. 

In  First  Brigade,  the  correlation  between  high  school  graduate  wages  and  the 
college  attendance  rate  is  0.97.  In  Third  Brigade,  the  correlation  between  these  same 
variables  is  0.95.  These  are  the  only  two  cases  of  correlation  above  the  designated 
threshold.  Both  variables  in  this  pair  are  from  data  that  was  transformed  into  monthly 
figures  by  linearizing  between  annual  observations.  In  both  First  and  Third  Brigades,  the 
college  attendance  rate  variable  is  eliminated  during  the  model  reduction  process. 

4.  Centered  Data 

In  order  to  meet  the  requirement  that  an  AR  model  must  be  developed  for  a  zero 
mean  series,  the  data  for  each  variables  is  centered  on  the  respective  training  set  mean, 
yorxr  Not  knowing  in  advance  the  order  of  the  ARMA  model  that  will  be  most 

effective,  the  centered  data  is  used  for  all  model  development.  The  series  averages  and 
extrema  for  all  variables  in  the  full,  test,  and  training  data  sets  for  each  brigade  are  listed 
in  Appendix  A. 

B.  MODEL  DEVELOPMENT 

1.  Initial  Model  Selection 

Strictly  autoregressive  (AR),  strictly  moving  average  (MA),  and  mixed 
autoregressive  moving  average  (ARMA)  models  are  all  explored  during  model 
development.  Additionally,  different  manipulations  of  the  centered  data  are  explored 
including  a  one-period  lead  of  all  predictive  variables  and  logistic  transformations. 
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Moving  average  models  using  the  centered  data  prove  the  most  effective  for  creating 
models  of  the  lowest  order  that  have  no  significant  ACF  or  PACF  of  the  residuals. 

The  selection  criteria  for  the  moving  average  models  are  contained  in  Table  5.1. 
In  this  table,  an  asterisk  in  the  “Moving  Average  Order”  column  indicates  the  model 
selected  for  step-wise  reduction. 


Moving  Average 
Order  (q) 

Significant  ACF  or  PACF 
of  residuals 

AIC 

1st  Brigade 

1  * 

no 

600.4 

2 

no 

601.4 

2nd  Brigade 

1 

no 

639.4 

2 

no 

3 

no 

633.5 

4  * 

no 

623.8 

5 

no 

628.1 

3rd  Brigade 

i 

no 

658.8 

2  * 

no 

641.6 

3 

yes  -  PACF  at  lag  k  =  8 

628.9 

5th  Brigade 

i 

yes  -  ACF  at  lag  k-2,  PACF  at  lag  k  =  4 

589.5 

2 

yes  -  ACF  at  lag  k  =  2,  PACF  at  lag  k  =  4 

580.3 

3 

no 

564.3 

4 

no 

5* 

no 

6 

no 

572.4 

6th  Brigade 

i 

yes  -  PACF  at  lag  k  =  7 

624.4 

2 

yes  -  PACF  at  lag  k  =  7 

625.2 

3 

yes  -  PACF  at  lag  k  =  4 

612.9 

4 

yes  -  ACF  at  lag  k  =  4, PACF  at  lag  k-A 

599.5 

5 

yes  -  PACF  at  lag  k  =  6 

6* 

no 

wm n 

7 

no 

592.8 

Table  5.1  Model  Selection  Criteria  Measures 
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The  histograms  contained  in  Figure  5.6  show  that  the  assumption  of  a  normal 
distribution  of  errors  is  plausible.  First,  Third,  and  Fifth  Brigades’  residual  histograms  are 
negatively  skewed,  while  Second  and  Sixth  Brigades’  are  positively  skewed. 
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Frequency  Scale 
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Frequency  Scale 
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Figure  5.6  Residual  Distributions  for  the  Initial  Models  in  Each  Brigade 


2.  Stepwise  Model  Reduction 

Reduced  models  are  produced  with  the  stepwise  recursion  using  both  the 
parametric  and  non-parametric  means  of  sampling  from  the  residual  errors.  The 
difference  in  which  factors  appear  in  the  final  models  produced  by  each  method  is 
minimal.  Only  one  variable  in  one  of  the  five  brigades  differs  between  the  two 
techniques.  This  result  supports  the  supposition  that  the  residuals  from  the  initial  models 
have  normal  distributions.  The  results  of  the  non-parametric  application  of  the  stepwise 
reduction  recursion  are  contained  in  Tables  5.3  and  5.4.  The  first  table  lists  the  sign  of  the 
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regressor  and  moving  average  coefficients  of  the  initial  full  models  and  those  of  the 
significant  factors  in  the  reduced  models.  Table  5.4  lists  the  sign  and  magnitude  of  the 
regressor  and  moving  average  coefficients  in  the  reduced  models. 


Final  Models  for  Centered  54-month  Data 


Index 

Factor 

1st  BDE  Full  Model 

Reduced 

2nd  BDE  Full  Model 

Reduced 

3rd  BDE  Full  Model 

Reduced 

5th  BDE  Full  Model 

Reduced 

6th  BDE  Full  Model 

Reduced 

#  BDEs  factor  significant 

#  BDEs  factor  slg  and  + 

#  BDEs  factor  slg  and  - 

1  MISSION 

+ 

+ 

+ 

CsZ 

+ 

+ 

+ 

+ 

3 

3 

2  RECRUITERS 

- 

- 

+ 

1. 1 

+ 

+ 

- 

- 

3 

1 

2 

3  TV ADS 

'•  .'V:  ':S 

immm 

iSBisgi 

mmmmm 

ar-t-ff 

4  RADIOADS 

iSSa 

MS 

WFSX-A 

6  NEWSPADS 

7  COLOPTION 

- 

- 

- 

+ 

+ 

+ 

+ 

+ 

3 

2 

i 

8  TGTPOP 

+ 

- 

- 

- 

- 

- 

2 

2 

9  UNEMP 

+ 

+ 

- 

- 

- 

- 

- 

- 

- 

- 
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1 
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+ 

+ 
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3 
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- 
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Figure  5.3  Sign  of  Significant  Factors  and  Moving  Average  Coefficients  for  Full  and 
Reduced  Models 
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Final  Models  for  Centered  54-month  Data 
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Figure  5.4  Sign  and  Magnitude  of  the  Reduced  Model  Regressors  and  Moving 
Average  Coefficients 


The  total  number  of  significant  factors  in  each  brigade  varies  between  nine  and 
sixteen.  The  number  of  significant  continuous  predictor  variables  in  a  brigade  ranges 
from  five  to  nine,  while  the  number  of  significant  monthly  indicators  varies  between  two 
and  ten.  All  of  the  continuous  variables  are  significant  in  at  least  two  brigades,  and  all  of 
the  monthly  indictors  are  significant  in  at  least  one  brigade.  Interpretation  of  the 
significant  regressors  is  addressed  in  Section  D. 

3.  Reduced  Model  Diagnostics 

None  of  the  residuals  from  the  final  reduced  models  display  significant  ACF  or 
partial  ACF.  In  all  cases,  the  reduced  models  have  a  lower  AIC  value  than  the  initial  full 
models,  as  reflected  in  Table  5.5.  Because  the  order  of  the  models  does  not  change  during 


47 


the  reduction  process,  the  smaller  AIC  values  stem  from  improvements  in  model 
accuracy. 


_ 

Initial  Full  Model  AIC 

Reduced  Model  AIC 

600 

591 

623 

611 

■ESE33SH1 

642 

642 

■EilllfSH 

546 

591 

577 

Table  5.5  Akaike  Information  Criteria  Values  for  the  Full  and  Reduced  Models 


C.  FORECASTING 

1.  Nine-month  Forecasts 

The  test  set  data  and  final  reduced  models  are  used  in  a  simulation  to  produce  a 
nine-month  predicted  time  series  of  the  centered  response  variable,  zt.  The  percent  error 
of  the  forecast  for  each  month,  as  defined  in  Chapter  IV,  is  calculated  and  reflected  in 
Table  5.6.  The  figures  marked  with  an  asterisk  represent  the  cases  in  which  the 
confidence  interval  contains  the  known  response  variable  value. 
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23.7 

29.4 

Table  5.6  Percent  Error  for  Each  Period  of  the  Nine-Month  Forecasts 


The  mean  predicted  value  of  the  response  variable,  yt(f0recast),  and  a  confidence 
interval  of  +/-  1  standard  deviation  of  each  forecast  are  plotted  along  with  the  known 
values  of  the  response  variable.  The  last  three  months  of  the  training  period  and  the 
predicted  time  series  for  each  brigade  are  reflected  in  Figures  5.7  through  5.11.  These 
graphs  reveal  that  the  forecasts  do  capture  the  general  behavior  of  recruiting  production 
during  the  test  period. 
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First  Brigade  Nine-Month  Forecast 


January  1997  to  September  1997 


Figure  5.7  First  Brigade  Forecast 

Second  Brigade  Nine-Month  Forecast 


January  1997  to  September  1997 


Figure  5.8  Second  Brigade  Forecast 
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Third  Brigade  Nine-Month  Forecast 


January  1997  to  September  1997 


Figure  5.9  Third  Brigade  Forecast 

Fifth  Brigade  Nine-Month  Forecast 


January  1997  to  September  1997 
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1996  1996  1996  1997  1997  1997  1997  1997  1997  1997  1997  1997 


Figure  5.10  Fifth  Brigade  Forecast 
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Recruiting  Contracts 


Sixth  Brigade  Nine-Month  Forecast 


January  1997  to  September  1997 


Oct  Nov  Dec  Jan  Feb  Mar  Apr  May  Jun  Jul  Aug  Sep 

1996  1996  1996  1997  1997  1997  1997  1997  1997  1997  1997  1997 


Figure  5.11  Sixth  Brigade  Forecast. 
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2.  Forecast  Error  Diagnostics 

The  forecast  errors,  At,  are  calculated  and  plotted  to  determine  if  they  appear 
randomly  distributed.  The  forecast  error  plots  are  shown  in  Figure  5.12. 


First  BDE  Forecast  Error  Second  BDE  Forecast  Error  Third  BDE  Forecast  Error 


123456789  123456789  123456789 


Forecasted  Month  Forecasted  Month  Forecasted  Month 


Fifth  BDE  Forecast  Error  Sixth  BDE  Forecast  Error 


123456789  123456789 


Forecasted  Month  Forecasted  Month 

Figure  5.12  Plot  of  the  forecast  errors  for  each  brigade 

The  limited  number  of  forecasts  makes  it  difficult  to  discern  if  the  errors  are  randomly 
distributed.  With  the  exception  of  Fifth  Brigade,  there  are  no  apparent  relationships  that 
cause  concern,  such  as  errors  increasing  with  forecast  length  or  similarities  in  the 
distributions  across  brigades.  The  errors  in  Fifth  Brigade  demonstrate  a  near-linear 
decrease  in  the  first  six  periods  of  the  forecast  series,  but  the  last  three  periods  appear  to 
return  to  a  random  pattern.  None  of  the  forecast  errors  display  significant  ACF  or  partial 
ACF. 
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D.  DISCUSSION 


1.  Regressor  Coefficient  Interpretation 

Analysis  of  the  significant  factors  in  the  final  reduced  models  addresses  three  of 
the  primary  objectives  of  this  thesis:  validating  factors  from  previous  research;  exploring 
recent  suppositions  regarding  prominent  factors;  and  enumeration  of  the  differences  in 
the  recruiting  environment  throughout  the  country.  In  general,  the  presence  of  factors  in 
the  final  models  and  the  sign  of  their  respective  coefficients  does  not  provide  clear  insight 
into  the  recruiting  process.  The  significant  factors  vary  considerably  across  brigades.  In 
many  cases,  the  signs  of  the  coefficients  for  the  same  significant  factors  are  inconsistent 
between  brigades,  and  some  are  inconsistent  with  prior  research.  The  significant  factors 
and  the  regressor  coefficient  signs  are  listed  in  Tables  5.3. 

At  least  one  USAREC  policy  variable  remains  in  the  final  model  of  each  brigade. 
Mission  is  significant  in  three  brigades  and  the  coefficients’  sign  is  positive  as  expected, 
meaning  that  recruiting  production  increases  when  USAREC  issues  higher  quotas. 
Recruiter  strength  is  significant  in  three  brigades.  In  Fifth  Brigade,  increasing  recruiter 
strength  has  a  measurable  positive  effect  on  production.  However,  in  First  and  Sixth 
Brigades,  the  sign  of  the  coefficient  for  this  variable  is  negative.  This  result  initially 
appears  counter-intuitive,  but  examination  of  the  time  series  graphs  reveals  that,  in  these 
brigades,  increases  in  recruiter  strength  did  not  provide  the  desired  effect  of  boosting 
production.  The  number  of  eligible  recruits  receiving  the  college  option  is  significant  in 
three  brigades.  Like  the  behavior  of  the  college  option  time  series,  the  impact  is  not 
consistent,  since  the  effect  of  the  variable  is  positive  in  only  two  of  these  brigades. 

The  reduced  models  for  each  brigade  contain  at  least  two  factors  of  the  recruiting 
environment.  The  target  population  variable  is  significant  in  two  brigades  and  for  both 
the  coefficient  sign  is  negative.  This  result  is  counter-intuitive.  Unemployment  is  a 
significant  variable  in  all  brigades,  which  is  in  line  with  prior  research.  However,  the 
expected  sign  of  the  coefficient  (positive)  is  present  in  only  one  brigade.  Once  again,  this 
result  may  be  partially  explained  by  examining  the  behavior  of  the  production  and 
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unemployment  time  series.  Over  the  period  of  this  study,  unemployment  decreases  in  all 
regions,  yet  production  remains  fairly  constant.  The  high  school  graduate  wage  level, 
which  is  also  prominent  in  previous  studies,  is  significant  in  all  brigades.  The  variable 
coefficients  display  the  expected  sign  (negative)  in  only  three  of  the  five  brigades.  The 
college  attendance  rate  appears  in  the  reduced  models  of  two  brigades,  though  the  impact 
of  the  variable  on  recruiting  production  is  positive  and  not  negative  as  expected.  The 
college  premium  variable  is  significant  in  three  brigades  and  behaves  as  expected  only  in 
First  Brigade. 

At  least  one  of  the  rival  services’  recruiting  production  variables  appears  in  the 
final  model  of  each  brigade.  Air  Force  high-quality  male  recruiting  contracts  are 
significant  in  three  brigades,  and  are  positive  predictors  in  two  of  these  regions.  Marine 
Corps  and  Navy  high-quality  male  recruiting  variables  are  significant  in  four  brigades. 
The  Marine  Corps  production  figures  are  negatively  related  to  Army  production  figures 
in  two  brigades.  The  Navy  production  variable  is  the  most  consistent.  Its  behavior  is 
positively  related  to  Army  production  in  all  four  brigades.  In  Sixth  Brigade  all  rival 
service  figures  are  significant  and  are  positive  predictors. 

The  sign  of  the  coefficients  for  the  monthly  indicators  behaves  as  expected  in  all 
but  one  case.  December  is  consistently  a  difficult  month,  while  June  is  a  prolific  month 
for  recruiting.  September  is  a  significant  and  positive  month  in  four  of  the  brigades.  In 
general,  the  winter  months  are  negative,  but  not  significant  in  all  regions.  All  of  these 
results  are  consistent  with  known  recruiting  production  behavior.  The  negative 
coefficient  for  the  July  indicator  in  Sixth  Brigade’s  final  model  represents  the  one 
counter-intuitive  result  for  the  binary  monthly  variables. 

2.  Model  Form 

It  is  difficult  to  draw  conclusions  regarding  the  first  three  thesis  objectives 
because  it  is  unclear  whether  the  inconsistencies  in  the  final  models  are  due  to  random 
noise,  inappropriate  model  form,  new  realities  of  the  recruiting  environment,  and/or 
legitimate  differences  in  the  recruiting  process  across  the  country.  The  final  models  also 
produce  disappointing  results  regarding  the  fourth  thesis  objective,  which  is  to  develop  an 
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accurate  tool  for  forecasting  recruiting  production.  Chatfield  provides  some  insight  into  a 
potential  explanation.  He  states  that  “building  a  ‘good’  model  from  data  subject  to 
feedback  can  be  difficult”(Chatfield,  1996).  Feedback  exists  in  systems  in  which  the 
outputs  in  one  period  affect  the  inputs  in  future  periods.  In  a  discussion  about 
econometric  models,  he  states  that  in  processes  with  rapid  feedback  loops,  a  single 
multivariate  equation  is  less  appropriate  than  modeling  the  system  with  multiple 
equations.  However,  he  maintains  that  a  multivariate  model  may  still  prove  useful  if  the 
feedback  in  the  system  is  slow  and  if  the  overall  system  is  not  well  controlled  by  the 
inputs,  as  in  the  case  of  the  economy. 

Chatfield’ s  observation  may  provide  an  explanation  of  why  model  accuracy 
improves  when  advertising  data  is  removed.  Clearly  recruiting  production  results  in  one 
period  have  an  impact  on  decisions  about  advertising  expenditures  in  future  periods. 
Concern  about  feedback  raises  the  issue  of  whether  all  factors  under  USAREC’s  control 
should  be  eliminated  from  the  model  or  whether  multiple  models  need  to  be  developed. 
Removal  of  all  variables  representing  policies  and  resources  under  USAREC’s  control  is 
unappealing  because  the  resultant  models  would  not  provide  any  basis  for  analyzing 
policy  and  resource  allocation.  The  feedback  loop  from  production  results  to  changes  in 
recruiter  strength  is  much  slower  than  the  response  time  in  advertising  feedback,  which  is 
an  argument  for  maintaining  this  factor  in  model  development.  Based  on  the  rapid 
feedback  of  production  results  on  future  missions  and  the  ease  with  which  missions  can 
be  changed,  there  may  be  a  legitimate  claim  that  inclusion  of  this  variable  has  the 
potential  to  induce  bias  in  the  model  coefficients.  However,  the  reduced  models  that 
contain  mission  as  a  significant  variable  are  no  more  or  less  accurate  than  those  in  which 
this  factor  is  eliminated  during  the  reduction  process. 


55 


THIS  PAGE  INTENTIONALLY  LEFT  BLANK 


56 


VI.  CONCLUSIONS  AND  RECOMMENDATIONS 


A.  SUMMARY 

The  United  States  Army  is  currently  experiencing  the  greatest  recruiting  shortfalls 
in  the  history  of  the  All-Volunteer  Force.  The  service  faces  unprecedented  competition 
for  young  people  as  unemployment  is  at  its  lowest  level  in  thirty  years  and  college 
attendance  rates  are  the  highest  in  American  history.  Under  these  conditions,  USAREC 
requires  tools  to  quantify  the  impact  of  factors  in  the  recruiting  environment,  to  identify 
differences  in  the  recruiting  processes  across  its  five  regional  subordinate  units,  and  to 
measure  the  effectiveness  of  its  policies  and  resource  expenditures. 

This  thesis  examined  recruiting  data  from  July  1992  to  September  1997,  a  very 
dynamic  period  for  the  Army  and  Recruiting  Command.  The  scope  was  limited  to  the 
high-quality  male  demographic,  defined  as  a  recruit  who  scored  above  the  50th  percentile 
on  the  AFQT  and  who  is  a  high  school  graduate  or  GED  holder.  This  thesis  aimed  to  do 
the  following: 

1.  Validate  factors  from  previous  models  on  more  current  data. 

2.  Explore  suppositions  regarding  new  influences  on  the  recruiting  environment. 

3.  Enumerate  the  differences  in  the  recruiting  environment  throughout  the 
country. 

4.  Develop  an  accurate  tool  for  predicting  recruiting  production. 

Multivariate  time  series  analysis  was  used  to  predict  the  number  of  enlistment 
contracts  signed  in  a  month  as  a  function  of  exogenous  and  endogenous  factors  plus 
monthly  indicators.  Fifteen  factors  were  initially  included  for  examination  in  this  study  as 
predictive  variables.  They  were  selected  based  on  their  appearance  in  previous  models  or 
in  recent  research.  The  bootstrap  was  used  to  overcome  the  difficulties  in  determining 
significant  factors  presented  by  short  duration  of  the  recruiting  data  time  series.  This 
technique  allowed  resampling  from  within  the  existing  data  to  provide  robustness  in  the 
factor  determination  process.  A  stepwise  recursion  was  developed  to  eliminate  from  the 
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time  series  models  factors  that  were  not  statistically  significant.  The  factors  remaining  in 
the  reduced  models  were  compared  to  those  found  to  be  significant  in  past  research.  The 
developed  models  were  also  used  to  create  nine-month  projections  of  recruiting 
production.  The  results  were  then  compared  to  known  production  figures  from  test  set 
data  to  determine  forecast  accuracy  levels. 

The  final  models  indicated  that  unemployment  figures  and  high  school  graduate 
wage  levels  are  significant  factors  for  predicting  recruiting  production.  These  results  were 
consistent  with  findings  from  previous  studies.  However,  the  impact  of  these  two  factors 
was  not  clearly  interpretable  across  the  five  recruiting  brigades.  In  some  brigades  the 
effect  of  these  variables  on  recruiting  production  was  positive,  while  in  other  brigades  the 
effect  was  negative.  No  consistent  factors  for  measuring  the  competition  between  the 
Army  and  post-secondary  schooling  emerged  from  the  model  development  process.  The 
final  models  did  successfully  capture  the  seasonal  nature  of  recruiting.  There  were 
considerable  differences  in  the  final  model  for  each  brigade,  which  despite  probable  noise 
in  the  data,  indicated  that  influential  predictors  of  recruiting  production  differ  regionally. 
The  forecasts  produced  using  the  final  models  captured  the  general  behavior  of  the 
recruiting  production  series  in  the  test  period,  but  overall  their  accuracy  was 
disappointing. 

B.  CONCLUSIONS 

The  1990’s  were  a  dynamic  period  for  U.S.  Army  recruiting.  Predictions  based  on 
the  past  are  dependent  on  the  assumption  that  past  patterns  within  each  series  and  the 
relationships  between  series  remain  the  same.  Clearly,  the  recruiting  process  and 
environment  was  evolving  rapidly  over  the  period  of  this  study.  This  evolution  created 
noise  in  the  data  used  in  this  thesis.  Noise  was  likely  also  induced  by  the  transformations 
required  to  convert  some  of  the  available  data  into  a  useable  form  for  time  series  analysis. 
The  results  do  support  the  intuition  that  the  influential  factors  differ  by  region,  a  subject 
not  addressed  in  the  previous  studies  reviewed.  Though  the  models  developed  in  this 


thesis  may  represent  a  descriptive  tool  for  what  occurred  in  the  recruiting  process  during 
the  period  studied,  they  lack  the  forecasting  accuracy  to  provide  legitimate  opportunities 
for  “what  if’  analysis  or  optimization. 

The  most  meaningful  contribution  of  this  thesis  is  the  development  of  the 
stepwise  recursion  using  bootstrap  simulation  for  identifying  significant  factors  in 
multivariate  time  series  analysis.  It  proved  to  be  a  useful  tool  for  providing  robustness  in 
a  situation  where  data  was  limited.  This  methodology  offers  potential  for  further 
refinement  and  application. 

C.  RECOMMENDATIONS  FOR  FURTHER  RESEARCH 

Recommendations  for  improvements  to  the  recruiting  research  fall  in  two 
categories:  data  collection  and  data  handling.  Future  studies  should  attempt  to  have  all 
predictive  data  specific  to  the  targeted  demographic.  For  example,  college  attendance 
rates,  unemployment,  and  wage  figures  should  address  17-24  year-old  males  only.  All 
original  data  should  consist  of  monthly  observations.  Additional  indicator  variables 
should  be  developed  to  represent  additional  policies,  organizational  structure,  incentive 
programs,  and  specific  events,  such  as  military  conflicts  or  government  shutdowns. 
Multiple  variables  could  be  combined  into  single  variable  representations  and  co¬ 
integration  vectors  could  be  developed.  Advanced  time  series  analysis  techniques  should 
be  explored  including  smoothing  and  filtering. 

The  stepwise  reduction  recursion  using  bootstrap  simulation  merits  further 
exploration.  A  fist  step  should  be  testing  the  method  with  data  for  which  accurate  results 
have  been  determined  through  other  techniques.  An  examination  of  the  impact  of 
changing  the  a  values,  the  parameter  for  controlling  the  tolerance  for  which  variables  are 
removed  from  the  model,  is  warranted.  A  method  that  uses  incremental  changes  in  the  a 
values  could  be  developed  to  rank  order  the  significance  of  factors.  Additionally,  the 
number  of  model  subsets  examined  in  the  model  development  process  could  be  expanded 
by  the  creation  of  a  forward  addition  stepwise  recursion.  These  recommendations  offer 
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means  to  further  develop  the  potential  shown  by  this  technique  for  multivariate  time 
series  analysis. 
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APPENDIX  A.  ORIGINAL  DATA  SERIES  SUMMARIES 
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Second  Brigade  Data  Summary 
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Fifth  Brigade  Data  Summary 
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Sixth  Brigade  Data  Summary 
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APPENDIX  B.  S-PLUS  CODE 

Bootstrap  Recursion  S-Plus  Code 


function (mod,  y,  xreg,  n  =  250,  parametric  =  F) 

{ 

# 

#  bootstrapSim:  Simulate  ARIMA  models  for  the  boostrap 

# 

#  args:  mod:  arima  model 

#  xreg:  matrix  of  regression  dependent  variables 

#  n:  Number  of  trials 

#  parametric:  Use  parametric  (Normal -based)  bootstrap?  Default  F 

# 

# 

if (class (mod)  !=  "arima")  stop ("This  function  operates  on  an 
arima  model " ) 

# 

#  Set  up  output 

# 

this. is. ar  <-  this. is. ma  <-  F 

xreg. out  <-  matrix(NA,  n,  length (mod$reg . coef) ) 
if (any (names (mod$model)  ==  "ar"))  { 
this. is. ar  <-  T 

ar.out  <-  matrix(0,  n,  mod$model$order [1] ) 

} 

if (any (names (mod$model)  ==  "ma"))  { 

this. is. ma  <-  T 

ma.out  <-  matrix(0,  n,  mod$model$order [ 3 ] ) 

} 

Extract  residuals 

resids  <-  arima . diag (mod,  resid  =  T,  plot  =  F) $resid 
r.len  <-  length (resids)  # 

# 

#  If  data  is  missing,  try  to  find  it. 

# 

if (missing (xreg) )  { 

name  <-  as . character (mod$call) [4] 
if ( ! exists (name) ) 

stop (paste ( "Can7 t  find  regressors  in",  name)) 
xreg  <-  get (name) 

} 

if (missing (y) )  { 

name  <-  mod$series 
if ( ! exists (name) ) 

stop (paste ( "Can' t  find  y  data  in",  name)) 
y  <-  get (name) 

} 

# 

# 
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#  Main  loop 

# 

if (mod$model$order [1]  !=  0)  { 

skip  <-  mod$model$order [1] 

first. resids  <-  y[l:skip]  -  xreg [1: skip,  ]  %*% 
mod$reg . coef 


the  previous  step  accounts  for  the  start-up  cost  of  AR  models  and 
fills  in  the  p  missing  residuals  with  approximations. 


} 

else 


for(i  in  l:n)  { 

if ( i  %%  100  ==  0)  { 

cat ( "Operating  on  loop  ",  i,  "\n") 

} 

if (parametric)  { 

cat ( "parametric\n" ) 
resid.sd  <-  sqrt (var (resids) ) 
new.y  <-  arima.sim(mod$model,  xreg  =  xreg, 
reg.coef  =  mod$reg.coef ,  innov 
=  rnormfn  =  length ( resids ) ,  sd  =  resid.sd)) 

} 

else  { 

new. resids  <-  c ( first . resids , 

resids [sample ( (skip  +  1) :r.len,  replace  = 

T)]) 

} 

new.y  <-  arima . sim (mod$model ,  xreg  =  xreg,  reg.coef  = 
mod$reg.coef ,  innov  =  new. resids) 
new. model  <-  arima  .ml e  (new.y,  model  =  mod$model,  xreg 
=  xreg,  max. f cal  =  400,  max. iter  =  250) 
if (new. mode 1$ converged  ==  F)  { 

cat ( "Warning:  model",  i,  "didn't  converge ! \n" ) 

} 

else  { 

if (this . is .ar) 

ar.out[i,  ]  <-  new. mode l$model$ar 
if (this . is .ma) 

ma.out[i,  ]  <-  new. mode l$model$ma 
xreg. out [i,  ]  <-  new. model $reg. coef 

} 

} 


{ 

first. resids  <-  y  -  xreg  %*%  mod$reg.coef 
for(i  in  l:n)  { 

if (i  %%  100  ==  0)  { 

cat ( "Operating  on  loop  ",  i,  " \n" ) 

} 

new. resids  <-  resids [sample (1 :r. len,  replace  =  T) ] 
if (parametric)  { 

cat ( "parametric\n" ) 
resid.sd  <-  sqrt (var (resids ) ) 


68 


new.y  <-  arima .  sim(mod$inodel ,  xreg  =  xreg, 
reg.coef  =  mod$reg.coef ,  innov 
=  rnorm(n  =  length (resids) ,  sd  =  resid.sd)) 

# 

#  If  "innov"  is  supplied,  it  should  be  a  vector, 

#  e.g.  innov  =  rnorm  (n  =  length (resids) ,  sd  =  resid.sd) 

#  If  "innov"  is  NOT  supplied,  then  a  vevtor  of  innovations  is  generated 

#  by  the  rand. gen ()  function,  which,  by  default,  is  rnorm.  Additional 

#  arguments  to  this  function  can  be  pased  as  arguments  to  arima. sim, 

#  e.g.  [innov  =  not  passed],  rand. gen  =  rnorm,  n  =  length ( resids )  , 

#  sd  =  resid.sd 

# 

} 

else  { 

new.y  <-  arima.  sim (mod$model,  xreg  =  xreg, 
reg.coef  =  mod$reg.coef ,  innov  =  new. resids) 

} 

new. model  <-  arima .mle  (new.y ,  model  =  mod$model, 
xreg  =  xreg,  max.fcal  =  300,  max. iter  =  150) 
if (new. model $converged  ==  F)  { 

cat { "Warning:  model",  i,  "didn't  converge ! \n" ) 

} 

else  { 

if (this . is .ma) 

ma.out[i,  ]  <-  new. model $model$ma 

xreg. out [i,  ]  <-  new.model$reg. coef 

} 

} 

} 

if (this . is .ar) 

if ( this . is .ma) 

return (list (Xreg  =  xreg. out,  AR  =  ar.out, 

MA  -  ma. out ) ) 

else  return ( list (Xreg  =  xreg. out,  AR  =  ar.out)) 
else  return (list (Xreg  =  xreg. out,  MA  =  ma.out)) 

} 
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Stepwise  Reduction  Recursion  S-Plus  Code 


function (mod,  y,  regressors,  n  =  1000,  parametric  =  F,  SD. range  =  1, 
maximuml t era t ions  =  1) 

{ 


# 

#  Stepwise:  eliminate  time-series  regressors  by  backward  elimination. 

# 


#  Arguments:  mod: 

#  y: 

#  regressors : 

#  n: 

#  parametric: 

#  SD . range : 

# 

# 

# 

#  maximumlterations : 

# 

# 

# 


arima  model 

y  data  vector (response  variable) 
matrix  of  regressors 
n  to  be  passed  to  bootstrapSim 
to  be  passed  to  bootstrapSim 

a  tolerance  for  deciding  when  to  stop  deleting 
columns.  Stop  deleting  regressors  when  no  column 
of  regression  coefficients  has  "0"  in  the  range 
(mean  +/-  SD. range  *  SD) . 

maximum  number  of  times  to  run  through  the 
discarding  loop.  Default  is  1. 


stillDiscarding  <-  T 
counter  <-  0  # 


Save  "y"  in  frame  1  so  "arima . diag"  and  others  can  find  it  if  needed 
assign ( "y" ,  y,  frame  =  1) 

while (stillDiscarding  &&  counter  <  maximumlterations)  { 
counter  <-  (counter  +  1) 

Print  the  current  columns  and  run  bootstrapSim. 

cat ("Loop  ",  counter,  ",  cols  are  ", 
dimnames  (regressors)  [  [2  ]]  ,  11  \n" ) 
cat ("Calling  bootstrapSim  to  execute  bootstrap  \n") 
bsRegCoefs  <-  bootstrapSim  (mod,  y,  regressors,  n  =  n, 
parametric  =  parametric) $Xreg 
proportion  <-  vector ( " single" ,  ncol (bsRegCoefs) ) 

# 

#  For  each  column  of  the  resulting  bootstrap  regression  coefficients, 

#  find  the  proportion  of  values  that  fall  between  0  and  (2  *  the  mean 

#  of  the  column) .  The  objective  is  to  discard  the  column  with  the 

#  largest  such  proportion;  that's  the  column  in  which  most  of  the 

#  values  are  close  to  0.  Each  column  of  bsRegCoefs  corresponds  to  a 

#  factor  in  the  time  series . 

# 

ford  in  1 :  (ncol  (bsRegCoefs)  -1))  { 
cat ( "Examining  factor  " , 

dimnames (regressors) [ [2] ] [i] ,  "\n" ) 
numerator  <-  0 

x.bar  <-  mean (bsRegCoefs [ ,  i] ,  na.rm  =  T) 
if(na.sum<-  sum ( is .na (bsRegCoefs [ ,  i] ) ) ) 
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^  ^  4 1  #  #  #  4t=  4*=  =tt= 


cat (paste ( "Encountered" ,  na.sum,  "missing 
coef f s\n" ) ) 

sd  <-  sgrt (var (bsRegCoef s [ ,  i]  ,  na. method  =  "omit")) 

If  0  is  in  the  range  of  the  mean  +/  SD. range  *  sd,  calculate 
proportion.  If  not,  assign  that  column  a  proportion  of  0. 

if (x. bar  -  SD. range  *  sd  <=  0  &&  x.bar  +  SD. range  * 
sd  >=  0)  { 

for(j  in  l:n)  { 

if (bsRegCoef [j ,  i]  ==  !NA)  { 

if (bsRegCoef s [j ,  i]  <  SD. range  *  sd 
ScSc  bsRegCoef  s  [j  ,  i]  >  -  (SD.  range 
*  sd))  { 

numerator  <-  numerator  +  1 

} 

} 

} 

proportion [i]  <-  numerator /n 

cat ( "Proportion  is  ",  proportion [i] ,  "\n") 

} 

else  proportion[i]  <-  0 

} 


After  examining  all  factors,  choose  the  factor  which  has  the  highest 
proportion.  If  all  proportions  are  equal  to  0,  set  flag  to  exit 
"while"  loop. 


} 


if (all (proportion  ==  0) )  { 

cat ("No  additional  factors  discarded  \n" ) 
stillDiscarding  <-  F 

} 

else  { 

max. index  <-  ( 1 : length (proportion) ) [proportion  == 
max (proportion) ] [1] 
cat ( "Discarding  ", 

dimnames (regressors)  [[2] ]  [max. index] ,  "\n" ) 
regressors  <-  regressors [ ,  -  max. index] 

assign ( "regressors" ,  regressors,  frame  =  1) 
mod  <-  arima.mle(y,  model  =  list (order  = 

mod$model$order) ,  xreg  =  regressors,  maxfcal  =  300, 
max. iter  =  150) 

} 

} 

cat("final  significant  factors  are:  ",  dimnames (regressors )[ [2 ]] , 
"  \n" ) 

return ( regressors ) 
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=#=  =#= 


Forecasting  Simulation  S-Plus  Code 


N, 


function (mod, 

{ 

# 

#  makePrediction: 

# 

# 

# 

# 

# 

# 

# 

#  args :  mod : 

#  N: 

#  XREG: 

#  regressionCoef : 


XREG,  regressionCoef,  loops,  .  .  . ) 


generates  multiple  ARIMA  simulations  to  predict  a 
univariate  time  series.  Returns  the  mean  and  sd  of 
the  predictions  for  each  period  in  the  series. 

Also  returns  a  histogram  of  the  simulated  values  for 
the  first  period  in  the  predicted  time  series.  The 
. . .  notation  allows  the  user  to  specify  how  the 
innovations  for  arima.sim  are  created. 

the  order  for  the  ARIMA  model 

the  number  of  periods  in  the  desired  time  series 
a  matrix  of  regression  variable  values 
a  vector  of  regression  coefficients  corresponding  to 
xreg 


predOut  <-  matrix (nrow  =  loops,  ncol  =  9) 
ford  in  1:  loops)  { 

x  <-  arima.sim (model  =  mod,  n  =  N,  xreg  =  XREG, 
reg.coef  =  regressionCoef,  . . . ) 
predOut [i,  ]  <-  x 

} 

mean  <-  apply (predOut ,  2,  mean) 
var  <-  apply (predOut ,  2,  var) 
sd  <-  sqrt (var) 
hist (predOut [ ,  1],  nclass  =  20) 
return (mean,  sd) 
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APPENDIX  C.  ORIGINAL  DATA  SERIES  CORRELOGRAMS 
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Newspaper  Ad  Impressions 
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Third  Brigade  Autocorrelation  Function  Correlograms 
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Newspaper  Ad  Impressions 
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